- Visibility 215 Views
- Downloads 55 Downloads
- Permissions
- DOI 10.18231/j.ijmr.2025.004
-
CrossMark
- Citation
Revolutionizing insights from genes: Fundamental role of data science in bioinformatics and healthcare
The narrative underlined the large scope of applications that data science encompasses in the form of machine learning, deep learning, and network analysis in unravelling the complex biological system, finding biomarkers, and predicting trends for diseases. To the experts, a closer look reveals the supremacy of data science in its role toward the advancement of personalized medicine as well as expedited drug discovery and advances in precision health methodologies. The transformation landscape of diagnostics is due to the use of machine learning in biotechnology and medicine. Machine learning is used to identify early diseases with sophisticated pattern recognition in genetic and clinical data. Deep learning algorithms find new potential therapeutic targets and enable patient-specific predictions of treatment response to improve the safety and efficiency of medical intervention. Multi-omics data further integrates machine learning, which provides a better understanding of the disease mechanism and pathways of treatments. The abstract highlights the importance of addressing data quality and privacy concerning fully realize the potential of data-driven bioinformatics through collaborative efforts. This review does not mince words about the role of data science in setting up the course for research in bioinformatics but especially indicates that data science is what is going to revolutionize healthcare approaches in the near future. This wide-ranging review outlines the substantial influence that data science has had on bioinformatics with the introduction of advanced computational techniques to this area, creating a new paradigm in life sciences towards the analysis, interpretation, and the creation of knowledge from large datasets.
Keywords: Bioinformatics, Computational biology, Data science, Machine learning, Precision medicine.
References
- Iqbal N, Kumar P. From data science to bioscience: emerging era of bioinformatics applications, tools and challenges. Procedia Comput Sci. 2023;218:1516–28.
- Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: data science enabling personalized medicine. BMC Med. 2018;16(1):150.
- Ristevski B, Chen M. Big data analytics in medicine and healthcare. J Integr Bioinform. 2018;15(3):20170030.
- Hériché JK, Alexander S, Ellenberg J. Integrating imaging and omics: computational methods and challenges. Annu Rev Biomed Data Sci. 2019;2(1):175-–97.
- Lin E, Lane HY. Machine learning and systems genomics approaches for multi-omics data. Biomark Res. 2017;5:2.
- Goh WWB, Wong L. The Birth of Bio-data Science: Trends, Expectations, and Applications. Genomics Proteomics Bioinformatics. 2020;18(1):5–15.
- Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, et al. ilastik: interactive machine learning for (bio) image analysis. Nat Methods. 2019;16(12):1226–32.
- Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577(7792):706–10.
- Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics (Oxford, England). 2018;34(13):i457–66.
- Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26(7):889–95.
- Wu F, Zhou Y, Li L, Shen X, Chen G, Wang X, et al. Computational Approaches in Preclinical Studies on Drug Discovery and Development. Front Chem. 2020;8:726.
- Kruse CS, Goswamy R, Raval Y, Marawi S. Challenges and Opportunities of Big Data in Health Care: A Systematic Review. JMIR Med Inform. 2016;4(4):e38
- Katahira K, Kunisato Y, Yamashita Y, Suzuki S. Commentary: A robust data-driven approach identifies four personality types across four large data sets. Front Big Data. 2020;3:8
- Kuswanto W, Nolan G, Lu G. Highly multiplexed spatial profiling with CODEX: bioinformatic analysis and application in human disease. Semin Immunopathol. 2023;45(1):145–57.
- Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform. 2017;18(5):851–69.
- Wang Y, Nakanishi M, Zhang D. EEG-Based Brain-Computer Interfaces. Adv Exp Med Biol. 2019;1101:41–65.
- Yang TL, Shen H, Liu A, Dong SS, Zhang L, Deng FY, et al. A road map for understanding molecular and genetic determinants of osteoporosis. Nat Rev Endocrinol. 2020;16(2):91–103.
- Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. Available from: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- Kauffmann A, Rayner TF, Parkinson H, Kapushesky M, Lukk M, Brazma A, et al. Importing ArrayExpress datasets into R/Bioconductor. Bioinformatics. 2009;25(16):2092–4.
- Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779–87.
- Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.
- Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009;4:14.
- Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19(2):185–93.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8.
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5.
- Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7(2):147–77.
- Komorowski M, Marshall DC, Salciccioli JD, Crutain Y. Exploratory Data Analysis. In: MIT Critical Data (Ed.), Secondary Analysis of Electronic Health Records. Springer; 2016:185–203.
- Xu S, Chen M, Feng T, Zhan L, Zhou L, Yu G. Use ggbreak to Effectively Utilize Plotting Space to Deal With Large Datasets and Outliers. Front Genet. 2021;12:774846.
- Kumar TS. Integrative Approaches in Bioinformatics: Enhancing Data Analysis and Interpretation. Innov Rev Eng Sci. 2024;1(1):30–
- Čuklina J, Lee CH, Williams EG, Sajic T, Collins BC, Rodríguez Martínez M, et al. Diagnostics and correction of batch effects in large‐scale proteomic studies: a tutorial. Mol Syst Biol. 2021;17(8):e10240.
- Nan Y, Del Ser J, Walsh S, Schönlieb C, Roberts M, Selby I, et al. Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. Inf Fusion. 2022;82:99–122.
- Chen C, Hou J, Tanner JJ, Cheng J. Bioinformatics methods for mass spectrometry-based proteomics data analysis. Int J Mol Sci. 2020;21(8):2873.
- Panditrao G, Bhowmick R, Meena C, Sarkar RR. Emerging landscape of molecular interaction networks: Opportunities, challenges and prospects. J Biosci. 2022;47(2):24.
- Aggarwal S, Suchithra M, Chandramouli N, Sarada M, Verma A, Vetrithangam D, et al. Rice Disease Detection Using Artificial Intelligence and Machine Learning Techniques to Improvise Agro‐ Business. Sci Program. 2022;2022(1):1757888.
- Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–5.
- Desai AB, Gangodkar DR, Pant K, Pant B. Harnessing the Potential of Light Gradient Boosting Machine for Accurate Diagnosis of Schizophrenia from EEG Signals. In: 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE; 2024:568–74.
- Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011;2(3):1–27.
- Tomar A, Pant B, Tripathi V, Verma KK, Mishra S. Improving QoS of cloudlet scheduling via effective particle swarm model. In: Machine Learning, Advances in Computing, Renewable Energy and Communication: Proceedings of MARC 2020. Singapore: Springer; 2022:137–50
- Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12(7):878.
- Nainwal A, Pant B, Sharma G. A comprehending deep learning approach for disease classification. In: IoT Based Control Networks and Intelligent Systems: Proceedings of 3rd ICICNIS 2022. Singapore: Springer Nature; 2022:113–22.
- Chen B, Ma L, Paik H, Sirota M, Wei W, Chua MS, et al. Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets. Nat Commun. 2017;8:16022.
- Ghanshala T, Tripathi V, Pant B. A Machine Learning Based Framework for Intelligent High Density Garbage Area Classification. In: Arai K, Kapoor S, Bhatia R, editors. Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1. FTC
- Advances in Intelligent Systems and Computing, vol 1288. Springer, Cham; 2021:147–52.
- Kansal V, Jain U, Pant B, Kotiyal A. Comparative analysis of convolutional neural network in object detection. In: ICT Infrastructure and Computing: Proceedings of ICT4SD 2022. Singapore: Springer Nature; 2022:87–95.
- Ghanshala T, Tripathi V, Pant B. An efficient image-based skin cancer classification framework using neural network. In: Research 32 Madan et al / Indian Journal of Microbiology Research 2025;12(1):21–33 in Intelligent and Computing in Engineering: Select Proceedings of RICE 2020. Singapore: Springer; 2021:851–8.
- Himmelstein DS, Baranzini SE. Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease- Associated Genes. PLoS Comput Biol. 2015;11(7):e1004259.
- Deka B, Maji P, Mitra S, Bhattacharyya DK, Bora PK, Pal SK, editors. Pattern Recognition and Machine Intelligence: 8th International Conference, PReMI 2019, Tezpur, India, December 17-20, 2019, Proceedings, Part I. Vol. 11941. Springer Nature; 2019
- Rajpoot NK, Singh P, Pant B. Nature-Inspired Load Balancing Algorithms for Resource Allocation in Cloud Computing. In: 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES). IEEE; 2023:827–32.
- Tanveer M, Pachori RB, editors. Machine Intelligence and Signal Analysis. Vol. 748. New York, NY: Springer; 2019
- Lipniacki T, Paszek P, Brasier AR, Luxon B, Kimmel M. Mathematical model of NF-kappaB regulatory module. J Theor Biol. 2004;228(2):195–215.
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
- Hopkins AL. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol. 2008;4(11):682–90.
- Kitchen DB, Decornez H, Furr JR, Bajorath J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov. 2004;3(11):935–49.
- Todeschini R, Consonni V. Handbook of Molecular Descriptors. Mannhold R, Kubinyi H, Timmerman H, series editors. John Wiley & Sons; 2008.
- Durrant JD, McCammon JA. Molecular dynamics simulations and drug discovery. BMC Biol. 2011;9:71.
- Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–35.
- Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372(9):793–5.
- Relling MV, Klein TE. CPIC: Clinical Pharmacogenetics Implementation Consortium of the Pharmacogenomics Research Network. Clin Pharmacol Ther. 2011;89(3):464–7.
- Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6:343
- Bauer P, Brannath W. The advantages and disadvantages of adaptive designs for clinical trials. Drug Discov Today. 2004;9(8):351–7.
- Whitehead J. The Design and Analysis of Sequential Clinical Trials. England: John Wiley & Sons; 1997.
- Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials. 1989;10(1):1–10.
- Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-World Evidence - What Is It and What Can It Tell Us?. N Engl J Med. 2016;375(23):2293–7.
- Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers. 2021;25(3):1315–60.
- Macarron R, Banks MN, Bojanic DA, Burns DJ, Cirovic DA, Garyantes T, et al. Impact of high-throughput screening in biomedical research. Nat Rev Drug Discov. 2011;10(3):188–95.
- Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inform. 2010;29(6-7):476–88.
- van Leeuwen RW, Brundel DH, Neef C, van Gelder T, Mathijssen RH, Burger DM, et al. Prevalence of potential drug-drug interactions in cancer patients treated with oral anticancer drugs. Br J Cancer. 2013;108(5):1071–8.
- Norén GN, Hopstadius J, Bate A, Star K, Edwards IR. Temporal pattern discovery in longitudinal electronic patient records. Data Min Knowl Discov. 2010;20:361–7.
- Magoč T, Salzberg SL. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics. 2011;27(21):2957–63.
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7(3):562–78.
- Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507(7493):455–61.
- Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008;9:40.
- Huang SY, Zou X. Advances and challenges in protein-ligand docking. Int J Mol Sci. 2010;11(8):3016–34.
- The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47(D1):D330–D8.
- Huang daW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57
- Laskowski RA, Watson JD, Thornton JM. Protein function prediction using local 3D templates. J Mol Biol. 2005;351(3):614–
- Grün D, Kester L, van Oudenaarden A. Validation of noise models for single-cell transcriptomics. Nat Methods. 2014;11(6):637–40.
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45.
- Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007;2(10):2366–82.
- Skinner ME, Uzilov AV, Stein LD, Mungall CJ, Holmes IH. JBrowse: a next-generation genome browser. Genome Res. 2009;19(9):1630–8.
- Rougier NP, Droettboom M, Bourne PE. Ten simple rules for better figures. PLoS Comput Biol. 2014;10(9):e1003833.
- Fernandes AD, Macklaim JM, Linn TG, Reid G, Gloor GB. ANOVA-like differential expression (ALDEx) analysis for mixed population RNA-Seq. PLoS One. 2013;8(7):e67019.
- Hollinda K, Daum C, Rios Rincón AM, Liu L. Digital Storytelling with Persons Living with Dementia: Elements of Facilitation, Communication, Building Relationships, and Using Technology. J Appl Gerontol. 2023;42(5):852–61.
- Taichman DB, Sahni P, Pinborg A, Peiperl L, Laine C, James A, et al. Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors. JAMA. 2017;317(24):2491–2.
- El Emam K, Rodgers S, Malin B. Anonymising and sharing individual patient data. BMJ. 2015;350:h1139.
- Ewoh P, Vartiainen T. Vulnerability to Cyberattacks and Sociotechnical Solutions for Health Care Systems: Systematic Review. J Med Internet Res. 2024;26:e46904.
- Huser V, Cimino JJ. Evaluating adherence to the International Committee of Medical Journal Editors' policy of mandatory, timely clinical trial registration. J Am Med Inform Assoc . 2013;20(e1):e169–e74.
- Bredenoord AL, Kroes HY, Cuppen E, Parker M, van Delden JJ. Disclosure of individual genetic data to research participants: the debate reconsidered. Trends Genet. 2011;27(2):41–7.
- Kaye J. The tension between data sharing and the protection of privacy in genomics research. Annu Rev Genomics Hum Genet. 2012;13:415–31.
- Lebo RV, Bixler M, Galehouse D. One multiplex control for 29 cystic fibrosis mutations. Genet Test. 2007;11(3):256–68.
- Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature. 2016;538(7624):161–4.
- Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol. 2013;25(5):571–8. Madan et al / Indian Journal of Microbiology Research 2025;12(1):21–33 33
- Aliper A, Plis S, Artemov A, Ulloa A, Mamoshina P, Zhavoronkov A. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. Mol Pharm. 2016;13(7):2524–30.
- Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502.
- Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science. 2015;347(6224):1257601
- Malin BA. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J Am Med Inform Assoc. 2005;12(1):28–34.
- Stolovitzky G, Monroe D, Califano A. Dialogue on reverse- engineering assessment and methods: the DREAM of high- throughput pathway inference. Ann N Y Acad Sci. 2007;1115:1–22.
- Kitano H. Systems biology: a brief overview. Science. 2002;295(5560):1662–4.
- Rotman D, Preece J, Hammock J, Procita K, Hansen D, Parr C, et al. Dynamic changes in motivation in collaborative citizen-science projects. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. 2012:217–26.
How to Cite This Article
Vancouver
Madan A, Kumar R, Garg R, Chugh P, Chattaraj S, Joshi NC, Gururani P, Verma D, Ray A, Yadav AN, Mitra D. Revolutionizing insights from genes: Fundamental role of data science in bioinformatics and healthcare [Internet]. Indian J Microbiol Res. 2025 [cited 2025 Oct 09];12(1):21-33. Available from: https://doi.org/10.18231/j.ijmr.2025.004
APA
Madan, A., Kumar, R., Garg, R., Chugh, P., Chattaraj, S., Joshi, N. C., Gururani, P., Verma, D., Ray, A., Yadav, A. N., Mitra, D. (2025). Revolutionizing insights from genes: Fundamental role of data science in bioinformatics and healthcare. Indian J Microbiol Res, 12(1), 21-33. https://doi.org/10.18231/j.ijmr.2025.004
MLA
Madan, Ayush, Kumar, Rahul, Garg, Rishabh, Chugh, Priya, Chattaraj, Sourav, Joshi, Naveen Chandra, Gururani, Prateek, Verma, Devvret, Ray, Anuprita, Yadav, Ajar Nath, Mitra, Debasis. "Revolutionizing insights from genes: Fundamental role of data science in bioinformatics and healthcare." Indian J Microbiol Res, vol. 12, no. 1, 2025, pp. 21-33. https://doi.org/10.18231/j.ijmr.2025.004
Chicago
Madan, A., Kumar, R., Garg, R., Chugh, P., Chattaraj, S., Joshi, N. C., Gururani, P., Verma, D., Ray, A., Yadav, A. N., Mitra, D.. "Revolutionizing insights from genes: Fundamental role of data science in bioinformatics and healthcare." Indian J Microbiol Res 12, no. 1 (2025): 21-33. https://doi.org/10.18231/j.ijmr.2025.004