Machine Learning for Endometrial Cancer Prediction and Prognostication
Predictive Modelling of Endometrial Cancer Risk Factors Using Machine Learning
Introduction
Endometrial cancer, also known as uterine cancer, is a malignancy that originates in the endometrium, the inner lining of the uterus. It is the most common type of cancer affecting the female reproductive system. Endometrial cancer typically develops when the cells of the endometrium undergo abnormal growth and division, forming a tumour, which then converts into cancer. The exact cause of this cancer is not fully understood, but certain risk factors such as obesity, hormonal imbalances, a history of estrogen-dependent conditions, and genetic predisposition may contribute to its development. Clinical manifestations often include abnormal uterine bleeding, pelvic pain, and a feeling of fullness in the lower abdomen. The diagnosis of endometrial cancer involves a thorough evaluation, including pelvic examinations, imaging studies, and tissue biopsy. Treatment options for endometrial cancer depend on the stage and extent of the disease but typically involve a combination of surgery, radiation therapy, and chemotherapy. Early detection and timely intervention play a crucial role in improving the prognosis and overall survival rates for patients with endometrial cancer.
Understanding the types of endometrial cancer is crucial for accurate diagnosis, treatment planning, and prognosis. The two main types of endometrial cancer (EC) are Type I and Type II.[1]
Type I endometrial cancer is the most common subtype that accounts for approximately 80% of cases. It has an overall 5‑year survival rate of 81.3%, and usually has less than 20% chance of recurrence.
On the other hand, grade‑3 EC (containing a solid component > 50%, poorly differentiated, does not appear as normal endometrial tissue, is aggressive, and is associated with poor prognosis) is classified as type II EC, represents approximately 20% of cases and is less influenced by estrogen.
Machine learning in endometrial cancer
Machine learning (ML) has emerged as a powerful tool in disease diagnosis and prognostication. By leveraging large datasets and advanced algorithms, ML models can analyze complex patterns and relationships within biomedical data to enhance disease detection and prediction. In the context of disease diagnosis, ML algorithms can effectively classify patient data, such as medical images or molecular profiles, to accurately identify disease presence or subtype. Additionally, ML models can leverage diverse clinical and molecular data to predict disease outcomes and prognosis, aiding in personalized treatment strategies. The integration of ML in disease diagnosis and prognostication has the potential to improve patient care, optimize treatment decisions, and facilitate the development of precision medicine approaches.
The successful application of ML in disease diagnosis and prognostication, as discussed in the paragraph above, is contingent on addressing specific challenges, such as class imbalance. Class imbalance refers to a situation where the distribution of samples across different classes is significantly unequal.
Most algorithms are designed to perform well when classes are balanced, assuming that each class has a similar number of instances for effective learning. However, when dealing with imbalanced data, the algorithms tend to be biased towards the majority class, resulting in poor predictive performance for the minority class (e.g., the positive cancer cases).
To address this, multiple techniques, such as resampling strategies (e.g., random oversampling or random under sampling), ensemble learning, appropriate evaluation metrics, boosting, cost-sensitive learning, one-class learning, and active learning, have been proposed as potential solutions for the class imbalance problem. The selection of the most effective strategy depends on the specific characteristics of the imbalanced dataset.
ML in Diagnosis and Risk prediction of EC
ML has shown promise in predicting the risk of endometrial cancer development. ML models can integrate multiple risk factors, including age, obesity, hormonal factors, and genetic predisposition, to create personalized risk prediction models. These models can aid in identifying individuals at higher risk, enabling targeted preventive strategies and early interventions. Not only in risk prediction, but ML algorithms can assist in the accurate and timely diagnosis of endometrial cancer. By analyzing imaging data, such as ultrasound and magnetic resonance imaging (MRI), ML models can help differentiate between benign and malignant lesions, providing valuable information to guide treatment decisions and reduce unnecessary interventions.
Significant advancements have been made in utilizing pattern recognition and image processing techniques for the detection, classification, and identification of endometrial cancer (EC).[2]
- Hodneland et al. applied a 3D convolutional neural network called UNet3D to automatically segment tumors in EC patients using preoperative pelvic MR images.[3] The study demonstrated that ML algorithms, such as UNet3D, can achieve tumor segmentation accuracy comparable to human experts, providing valuable information on tumor volume, texture features, and borders. This approach holds promise for near-real-time radiomic tumor profiling, enabling risk stratification and the development of personalized treatment strategies.
- Another study by Dong et al. developed a deep learning model that predicted deep muscle invasion based on MR images of EC patients.[4] Although the accuracy rate of 75% was achieved, the difference compared to radiologists’ readings was not statistically significant.
- Xu et al. developed a prediction model for lymph node metastasis (LNM, which serves as a strong predictive factor for EC outcomes) in EC patients using MR images and CA125 values, achieving an accuracy of approximately 85%.[5]
ML in Prognosis and Risk prediction of EC
ML algorithms can analyse large volumes of patient data, including clinical, pathological, and genomic information, to develop predictive models for risk assessment and prognosis. These algorithms can identify patterns and relationships that may not be readily apparent to humans. ML models can help stratify patients into different risk groups and provide valuable insights into the disease’s progression.
- In a study conducted by Praiss et al., an unsupervised machine learning (ML) algorithm called Ensemble Algorithm for Clustering Cancer Data (EACCD) was adapted and utilized to classify endometrial cancer (EC) patients based on TNM staging, grade, and age. EACCD, a combination of clustering methods, incorporates dissimilarity estimation and hierarchical clustering to identify distinct patient clusters. The innovative application of this ML method demonstrated improved prognostic prediction for EC.[6],[7] While the majority of women with early-stage EC have a favourable prognosis, a significant subset, approximately 15% of patients with stage I and II EC, experience recurrence.[8]
- Akazawa et al. conducted a study using EC patients and employed five ML algorithms, including random forest (RF)[9], logistic regression (LR)[10], decision tree (DT)[11], support vector machine (SVM)[12], and boosted tree[13], to predict recurrence based on various clinical parameters. Accuracy and area under the curve (AUC) analyses were performed to assess the predictive performance of these models. The SVM exhibited the highest accuracy, followed by LR, while boosted trees showed the lowest accuracy. Regarding AUC, LR achieved the highest value, while RF had the lowest. Consequently, LR was determined to be the most effective predictive model for the study.16 This investigation demonstrated the feasibility of using ML algorithms to predict recurrence in early-stage EC, thus offering potential improvements in efficiency and accuracy for recurrence and treatment response prediction.19,[14]
Lymph node involvement (LNI) serves as a significant prognostic factor in various cancers, including endometrial cancer (EC). However, currently, there is no validated method available to accurately predict LNI in EC. A recent study by Günakan et al. explored the application of the Naïve Bayes (NB) algorithm for predicting LNI in EC patients. This study utilized multiple histopathological factors, including final histology, lymph vascular space invasion (LVSI), grade, tumor diameter, depth of myometrial invasion, cervical glandular stromal invasion, tubal or ovarian involvement, and pelvic LNI.[15] The results demonstrated that the NB algorithm achieved high accuracy in predicting LNI based on these histopathological factors, suggesting that machine learning (ML) could play a role in decision-making for managing EC.
Endometrioid endometrial adenocarcinoma (EEA) represents the most prevalent subtype of endometrial cancer (EC). Unfavourable outcomes associated with disease dissemination have been linked to high tumor grade, advanced surgical stage, and lymph vascular space invasion (LVSI).[16] These factors collectively suggest that conventional clinical features alone are insufficient in accurately predicting EEA prognosis. Yin et al. conducted a study aimed at developing a prognostic model for EEA by integrating gene expression data and traditional features using the random forest (RF) algorithm. Three models were constructed, utilizing (a) 11 genes exclusively, (b) stage and grade parameters, and (c) both 11 genes and stage and grade factors. The findings indicated that the RF model incorporating “11 genes and grade” exhibited superior performance compared to RF models based solely on the 11 genes or grade, thus highlighting the enhanced predictive ability of the RF model combining gene expression and clinical features for EEA prognosis. Thus, a combined RF model and clinical criteria may serve better for the stratification of patients in the clinic.[17],[18],[19],[20]
Limitations of ML in EC
While machine learning (ML) approaches offer significant potential in endometrial cancer research and clinical practice, they also come with certain limitations that need to be acknowledged. Some of the key limitations of ML approaches in general, and also in the context of endometrial cancer include:
Data quality and representativeness: ML models heavily rely on the quality and representativeness of the input data. Biases or errors in the data, such as incomplete or missing data, can negatively impact the performance and generalizability of ML models. Additionally, if the training dataset is not representative of the target population or lacks diversity, the model may not effectively capture the full spectrum of endometrial cancer characteristics.
Interpretability and explainability: Many ML algorithms, such as deep learning models, operate as black boxes, making it challenging to interpret and explain the underlying decision-making process. The lack of transparency in model predictions can limit their adoption in clinical settings, where interpretability and explainability are crucial for gaining trust and acceptance from healthcare professionals.
Overfitting and generalizability: ML models can be prone to overfitting, where the model learns specific patterns or noise present in the training data but fails to generalize well to new, unseen data. This can lead to overly optimistic performance during training but poor performance on real-world data. Ensuring robust generalization of ML models in endometrial cancer requires careful validation and testing on independent datasets.
Data availability and integration: Access to high-quality, comprehensive, and diverse datasets is essential for building accurate and reliable ML models. However, in the field of endometrial cancer, obtaining large-scale, well-curated datasets with longitudinal information and multi-modal data (e.g., genomics, imaging, clinical data) can be challenging. The integration of different data types and overcoming data heterogeneity pose additional difficulties in leveraging the full potential of ML approaches.
Ethical and legal considerations: The use of ML models in healthcare raises important ethical and legal considerations. Patient privacy, data security, and the potential for biased decision-making are critical concerns. ML algorithms must be developed and deployed with appropriate safeguards to protect patient rights and ensure fair and equitable outcomes.
Understanding and addressing these limitations is crucial for the responsible application and adoption of ML approaches in endometrial cancer research and clinical settings. Collaborations between researchers, clinicians, and data scientists can help navigate these challenges and advance the effective integration of ML in improving endometrial cancer care.
REFERENCES
[1] Bokhman JV. Two Pathogenetic Types of Endometrial Carcinoma. Gynecol Oncol (1983) 15(1):10–7. doi: 10.1016/0090-8258(83)90111-7.
[2] Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning Dense Volumetric Segmentation From Sparse Annotation. In: S Ourselin, L Joskowicz, MR Sabuncu, G Unal and W Wells, editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. Cham: Springer International Publishing (2016). p. 424–32. Lecture Notes in Computer Science.
[3] Hodneland E, Dybvik JA, Wagner-Larsen KS, Š olté szová V, Munthe-Kaas AZ, Fasmer KE, et al. Automated Segmentation of Endometrial Cancer on MR Images Using Deep Learning. Sci Rep (2021) 11(1):179. doi: 10.1038/ s41598-020-80068-9.
[4] Hodneland E, Dybvik JA, Wagner-Larsen KS, Š olté szová V, Munthe-Kaas AZ, Fasmer KE, et al. Automated Segmentation of Endometrial Cancer on MR Images Using Deep Learning. Sci Rep (2021) 11(1):179. doi: 10.1038/ s41598-020-80068-9.
[5] Xu X, Li H, Wang S, Fang M, Zhong L, Fan W, et al. Multiplanar MRI-Based Predictive Model for Preoperative Assessment of Lymph Node Metastasis in Endometrial Cancer. Front Oncol (2019) 9:1007. doi: 10.3389/ fonc.2019.01007.
[6] Praiss AM, Huang Y, St Clair CM, Tergas AI, Melamed A, Khoury-Collado F, et al. Using Machine Learning to Create Prognostic Systems for Endometrial Cancer. Gynecol Oncol (2020) 159(3):744–50. doi: 10.1016/ j.ygyno.2020.09.047.
[7] Praiss AM, Huang Y, St Clair CM, Tergas AI, Melamed A, Khoury-Collado F, et al. Using Machine Learning to Create Prognostic Systems for Endometrial Cancer. Gynecol Oncol (2020) 159(3):744–50. doi: 10.1016/ j.ygyno.2020.09.047.
[8] Bristow RE, Purinton SC, Santillan A, Diaz-Montes TP, Gardner GJ, Giuntoli RL. Cost-Effectiveness of Routine Vaginal Cytology for Endometrial Cancer Surveillance. Gynecologic Oncol (2006) 103(2):709–13. doi: 10.1016/ j.ygyno.2006.05.013.
[9] Sarica A, Cerasa A, Quattrone A. Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer’s Disease: A Systematic Review. Front Aging Neurosci (2017) 9:329. doi: 10.3389/fnagi.2017.00329.
[10] Mount DW, Putnam CW, Centouri SM, Manziello AM, Pandey R, Garland LL, et al. Using Logistic Regression to Improve the Prognostic Value of Microarray Gene Expression Data Sets: Application to Early-Stage Squamous Cell Carcinoma of the Lung and Triple Negative Breast Carcinoma. BMC Med Genomics (2014) 7:33. doi: 10.1186/1755-8794-7-33.
[11] Kingsford C, Salzberg SL. What are Decision Trees? Nat Biotechnol (2008) 26 (9):1011–3. doi: 10.1038/nbt0908-1011.
[12] Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics (2018) 15(1):41–51. doi: 10.21873/cgp.20063.
[13] Zhang Z, Zhao Y, Canes A, Steinberg D, Lyashevska Owritten on behalf of AME Big-Data Clinical Trial Collaborative Group. Predictive Analytics With Gradient Boosting in Clinical Medicine. Ann Transl Med (2019) 7(7):152. doi: 10.21037/atm.2019.03.29.
[14] Akazawa M, Hashimoto K, Noda K, Yoshida K. The Application of Machine Learning for Predicting Recurrence in Patients With Early-Stage Endometrial Cancer: A Pilot Study. Obstet Gynecol Sci (2021) 64(3):266– 73. doi: 10.5468/ogs.20248.
[15] Langarizadeh M, Moghbeli F. Applying Naive Bayesian Networks to Disease Prediction: A Systematic Review. Acta Inform Med (2016) 24(5):364–9. doi: 10.5455/aim.2016.24.364-369.
[16] Srikantia N BR, Rajeev AG, Kalyan SN. Endometrioid Endometrial Adenocarcinoma in a Premenopausal Woman With Multiple Organ Metastases. Indian J Med Paediatr Oncol (2009) 30(2):80–3. doi: 10.4103/ 0971-5851.60053.
[17] Srikantia N BR, Rajeev AG, Kalyan SN. Endometrioid Endometrial Adenocarcinoma in a Premenopausal Woman With Multiple Organ Metastases. Indian J Med Paediatr Oncol (2009) 30(2):80–3. doi: 10.4103/ 0971-5851.60053.
[18] Bendifallah S, Ouldamer L, Lavoue V, Canlorbe G, Raimond E, Coutant C, et al. Patterns of Recurrence and Outcomes in Surgically Treated Women With Endometrial Cancer According to ESMO-ESGO-ESTRO Consensus Conference Risk Groups: Results From the FRANCOGYN Study Group. Gynecol Oncol (2017) 144(1):107–12. doi: 10.1016/j.ygyno.2016.10.025.
[19] Cancer Genome Atlas Research Network, Kandoth C, Schultz N, Cherniack AD, Akbani R, Liu Y, et al. Integrated Genomic Characterization of Endometrial Carcinoma. Nature (2013) 497(7447):67–73. doi: 10.1038/nature12325.
[20] Burki TK. Predicting Lung Cancer Prognosis Using Machine Learning. Lancet Oncol (2016) 17(10):e421. doi: 10.1016/S1470-2045(16)30436-3.
With over 11 years of experience in data science, SAS, and research and development, Ankur has a strong background in statistical analysis, big data, and algorithm design. He has edited books on artificial intelligence, machine learning, and big data for healthcare applications.