Evaluating routine biochemical parameters for breast cancer detection in Indian women: A machine learning approach

Raghav, Nirbhay

dc.contributor.advisor	Rangarajan, Annapoorni
dc.contributor.advisor	Chandra, Nagasuma
dc.contributor.author	Raghav, Nirbhay
dc.date.accessioned	2024-01-02T04:58:13Z
dc.date.available	2024-01-02T04:58:13Z
dc.date.submitted	2023
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6340
dc.description.abstract	Breast cancer stands as the leading cancer type among Indian women, with a mortality rate that surpasses that of nations such as the USA, Australia, and the UK. The higher mortality rate is largely explained by a lack of awareness, delayed diagnosis, and poor prognosis. Moreover, the age-onset of breast cancer in India is a decade lower than in the Caucasian population. Developed nations have successfully leveraged technologies such as mammograms by employing nationwide mandatory screening programs to identify early-stage breast cancer. However, the unique demographic landscape of India - characterized by a substantial population, poses a significant challenge to the nationwide implementation of such methods, largely due to their unaffordability. Liquid biopsy-based biomarkers (e.g., CTCs, CtDNA, miRNA) are promising, yet their reliance on sophisticated isolation techniques and high-throughput sequencing technologies render them expensive and are currently inapplicable for the Indian population. Hence, there is an unmet need to develop methods that can be cost-effective, fast, and scalable for large-scale screening of breast cancer in India. In this thesis, we propose a shift in focus towards routinely measured blood-and urine-based biomarkers and thus evaluate their utility in breast cancer detection. We screened 106 blood and urine biochemical parameters, along with medical history, for 54 breast cancer and 97 non-cancer subjects (controlled for age and BMI), including individuals with and without type 2 diabetes mellitus, from two different institutions. We performed a comprehensive comparison of 106 biochemical parameters between breast cancer and non-cancer cohorts and observed multiple parameters from lipid, liver, and immune cell profile to be significantly different in breast cancer subjects. These results were in line with the existing reports on the Western population related to elevated levels of LDL, triglycerides, and other parameters that we found to be perturbed in the Indian cohort. Univariate ROC analysis showed AUC values between 0.7 to 0.86 for these significant parameters suggestive of high diagnostic potential. We trained machine learning models, including logistic regression (LR), random forest (RF), support vector machines (SVM), and XGBoost (XG), to perform multivariate analysis. We performed repeated stratified 5- and 10-fold cross-validation for the training and evaluation of all models. Results from these models showed balanced accuracy scores between 0.8 to 0.84 and ROC-AUC values ranging from 0.91 to 0.94, indicating the potential diagnostic value of these parameters. To further validate these results, we performed permutation scoring by training and testing 1000 randomly shuffled datasets, which indicated that there were meaningful dependencies present between the features and the target variables, and the models were successful in utilizing it for prediction. We performed feature selection by identifying pairwise common features from the top 30 features of each of the three models - Logistic Regression (LR), Random Forests (RF), and XGBoost (XG). These shared features were then merged with the list of significant features (determined through Welch’s t-test) to create three different feature sets. We trained LR, RF, and XG models on these three feature sets and the significant feature set. Finally, we identified seven features that were consistently highly ranked across all three feature sets and models. Our findings support the use of these routine biochemical parameters for breast cancer detection. As this was the first-of-its-kind study conducted in an Indian population, it must be emphasized that these results remain to be validated in an independent cohort and across the country. Nonetheless, our study aimed to investigate breast cancer screening problems faced by India by evaluating the effectiveness of routine biochemical parameters within an Indian cohort. We believe that this investigation will serve further studies and may result in the development of a cost-effective, time-efficient, and pragmatic nationwide breast cancer screening program, thus, contributing to improved prognosis and reduced mortality rates in the Indian population.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00352
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Breast cancer	en_US
dc.subject	Machine learning	en_US
dc.subject	Breast cancer detection	en_US
dc.subject	Early detection	en_US
dc.subject.classification	Research Subject Categories::INTERDISCIPLINARY RESEARCH AREAS	en_US
dc.title	Evaluating routine biochemical parameters for breast cancer detection in Indian women: A machine learning approach	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Thesis_final_NR .pdf
Size:: 4.701Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Interdisciplinary mathematical sciences (IMS) [14]

Show simple item record