已发表论文

基于机器学习的非转移性结肠癌预后预测模型:基于监测、流行病学和最终结果数据库和中国队列的分析

 

Authors Tang M, Gao L, He B, Yang Y

Received 23 September 2021

Accepted for publication 1 December 2021

Published 4 January 2022 Volume 2022:14 Pages 25—35

DOI https://doi.org/10.2147/CMAR.S340739

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Chien-Feng Li

Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated.
Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with non-metastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values.
Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84– 0.88), 0.82 (0.81– 0.83), and 0.81 (0.79– 0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64– 0.79), 0.79 (0.74– 0.86), and 0.89 (0.82– 0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors.
Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
Keywords: colon cancer, machine learning, extreme gradient boosting, prognostic prediction models