dc.description.abstract | Breast cancer stands as the leading cancer type among Indian women, with a mortality rate that surpasses that of nations such as the USA, Australia, and the UK. The higher mortality rate is largely explained by a lack of awareness, delayed diagnosis, and poor prognosis. Moreover, the age-onset of breast cancer in India is a decade lower than in the Caucasian population. Developed nations have successfully leveraged technologies such as mammograms by employing nationwide mandatory screening programs to identify early-stage breast cancer. However, the unique demographic landscape of India - characterized by a substantial population, poses a significant challenge to the nationwide implementation of such methods, largely due to their unaffordability. Liquid biopsy-based biomarkers (e.g., CTCs, CtDNA, miRNA) are promising, yet their reliance on sophisticated isolation techniques and high-throughput sequencing technologies render them expensive and are currently inapplicable for the Indian population. Hence, there is an unmet need to develop methods that can be cost-effective, fast, and scalable for large-scale screening of breast cancer in India. In this thesis, we propose a shift in focus towards routinely measured blood-and urine-based biomarkers and thus evaluate their utility in breast cancer detection. We screened 106 blood and urine biochemical parameters, along with medical history, for 54 breast cancer and 97 non-cancer subjects (controlled for age and BMI), including individuals with and without type 2 diabetes mellitus, from two different institutions.
We performed a comprehensive comparison of 106 biochemical parameters between breast cancer and non-cancer cohorts and observed multiple parameters from lipid, liver, and immune cell profile to be significantly different in breast cancer subjects. These results were in line with the existing reports on the Western population related to elevated levels of LDL, triglycerides, and other parameters that we found to be perturbed in the Indian cohort. Univariate ROC analysis showed AUC values between 0.7 to 0.86 for these significant parameters suggestive of high diagnostic potential. We trained machine learning models, including logistic regression (LR), random forest (RF), support vector machines (SVM), and XGBoost (XG), to perform multivariate analysis. We performed repeated stratified 5- and 10-fold cross-validation for the training and evaluation of all models. Results from these models showed balanced accuracy scores between 0.8 to 0.84 and ROC-AUC values ranging from 0.91 to 0.94, indicating the potential diagnostic value of these parameters. To further validate these results, we performed permutation scoring by training and testing 1000 randomly shuffled datasets, which indicated that there were meaningful dependencies present between the features and the target variables, and the models were successful in utilizing it for prediction. We performed feature selection by identifying pairwise common features from the top 30 features of each of the three models - Logistic Regression (LR), Random Forests (RF), and XGBoost (XG). These shared features were then merged with the list of significant features (determined through Welch’s t-test) to create three different feature sets. We trained LR, RF, and XG models on these three feature sets and the significant feature set. Finally, we identified seven features that were consistently highly ranked across all three feature sets and models.
Our findings support the use of these routine biochemical parameters for breast cancer detection. As this was the first-of-its-kind study conducted in an Indian population, it must be emphasized that these results remain to be validated in an independent cohort and across the country. Nonetheless, our study aimed to investigate breast cancer screening problems faced by India by evaluating the effectiveness of routine biochemical parameters within an Indian cohort. We believe that this investigation will serve further studies and may result in the development of a cost-effective, time-efficient, and pragmatic nationwide breast cancer screening program, thus, contributing to improved prognosis and reduced mortality rates in the Indian population. | en_US |