已发表论文

基于机器学习的 SULF1、CXCL8 和 PBLD 表达整合分析作为结直肠癌早期检测和预后的鉴别生物标志物

 

Authors Li Y, Shi J, Mei C, Zhou F, Zhao H, Zhang L 

Received 14 July 2025

Accepted for publication 22 October 2025

Published 4 December 2025 Volume 2025:18 Pages 7285—7308

DOI https://doi.org/10.2147/IJGM.S553709

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Ching-Hsien Chen

Yang Li,1,2,* JianFeng Shi,3,4,* Chao Mei,5 FangYuan Zhou,1 HaoSen Zhao,6 Li Zhang1 

1Department of Laboratory Medicine, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, Liaoning, 121001, People’s Republic of China; 2China Medical University, Shenyang, Liaoning, 110000, People’s Republic of China; 3Department of Cardiology, The Fourth Affiliated Hospital of China Medical University, Shenyang, Liaoning, 110000, People’s Republic of China; 4Department of Cardiology, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, Liaoning, 121001, People’s Republic of China; 5Jinzhou Medical University, Jinzhou, Liaoning, 121001, People’s Republic of China; 6The Third Affiliated Hospital of Jinzhou Medical University, Jinzhou, Liaoning, 121001, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Li Zhang, Department of Laboratory Medicine, The First Affiliated Hospital of Jinzhou Medical University, No. 2, Section 5, Renmin Street, Guta District, Jinzhou, Liaoning, 121001, People’s Republic of China, Email zl252981340@163.com HaoSen Zhao The Third Affiliated Hospital of Jinzhou Medical University, No. 2, Section 5, Heping Road, Linghe District, Jinzhou, Liaoning, 121001, People’s Republic of China, Email zhs_drzhao@outlook.com

Background: Colorectal cancer (CRC) is one of the major cancers that threaten human health. Although the CRC census has been gradually popularized, due to the lack of obvious symptoms in the early stage, it is difficult to detect, and the rapid progression and strong metastasis after onset result in a high incidence of CRC. Therefore, the current research aims to identify more powerful molecular targets and biomarkers for the diagnosis, treatment and clinical research of CRC.
Methods: The limma package was used to analyze datasets GSE4107, GSE110223, and GSE110224 from the Gene Expression Omnibus (GEO) to identify differentially expressed genes (DEGs) in CRC. Functional enrichment analysis of DEGs was performed using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG). To further screen for key genes, the DEGs were submitted to the STRING database to construct a protein-protein interaction (PPI) network. Clinical data from The Cancer Genome Atlas (TCGA) database were used to analyze the role of key genes in CRC. Key DEGs were validated using immunohistochemistry, Western blot, and quantitative real-time polymerase chain reaction (RT-qPCR). Survival analysis of key DEGs was performed using the GEPIA database, and survival curves were plotted. The expression levels of DEGs were quantitatively analyzed in samples from 80 CRC patients and 80 healthy controls. Machine learning algorithms were applied to analyze key DEGs and construct a diagnostic model for CRC. A receiver operating characteristic (ROC) curve was plotted to evaluate the performance of the diagnostic model.
Results: A total of 981 (GSE4107), 155 (GSE110223), and 280 (GSE110224) DEGs were identified from the GEO databases, among which 152 DEGs were expressed in at least two datasets. GO and KEGG enrichment analyses revealed that these DEGs were widely involved in biological processes such as the muscle system process and extracellular matrix organization. Downregulated genes were involved in pathways including bile secretion and retinol metabolism. PPI network analysis identified 20 overlapping genes, among which CXCL8 and SULF1 were hub up-regulated genes, while PBLD and 17 others were hub down-regulated genes. mRNA-Seq data and RT-qPCR validation showed that CXCL8 and SULF1 were significantly upregulated in CRC samples, whereas PBLD expression levels were higher in normal tissues compared to CRC tissues. Kaplan-Meier curve analysis indicated that high mRNA expression of SULF1 was significantly associated with poorer overall survival in CRC patients, while high mRNA expression of LRRC19 was associated with better overall survival. In contrast, the mRNA expression of CXCL8 and PBLD showed no significant association with overall survival. Gene expression of SULF1 was significantly correlated with disease-free survival, whereas the gene expression of LRRC19, CXCL8, and PBLD showed no significant correlation with disease-free survival. Immunohistochemical analysis further validated the expression levels of SULF1, CXCL8, and PBLD. The machine learning model demonstrated high efficacy in assisting CRC diagnosis, with an AUC value exceeding 0.8, and the most effective model achieved an AUC value greater than 0.9. Decision curve and calibration curve analyses further confirmed its significant clinical net benefit and good consistency.
Conclusion: These four identified DEGs (SULF1, CXCL8, LRRC19, and PBLD) may contribute to the treatment of CRC as a new therapeutic target and provide valuable biomarkers for cancer metastasis research.The four identified DEGs were combined with machine learning to construct a CRC diagnostic model with high clinical application value.

Keywords: colorectal cancer, biomarker, sulfatase-1, bioinformatic analysis, R language, machine learning