已发表论文

利用机器学习方法识别和验证新冠肺炎临床高炎症和低炎症预后表型

 

Authors Ji X, Guo Y , Tang L , Gao C 

Received 22 November 2024

Accepted for publication 18 February 2025

Published 27 February 2025 Volume 2025:18 Pages 3009—3024

DOI https://doi.org/10.2147/JIR.S504028

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Professor Ning Quan

Xiaojing Ji, Yiran Guo, Lujia Tang, Chengjin Gao

Department of Emergency, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, People’s Republic of China

Correspondence: Lujia Tang; Chengjin Gao, Email tanglujia@xinhuamed.com.cn; gaochengjin@xinhuamed.com.cn

Background: COVID-19 exhibits complex pathophysiological manifestations, characterized by significant clinical and biological heterogeneity. Identifying phenotypes may enhance our understanding of the disease’s diverse trajectories, benefiting clinical practice and trials.
Methods: This study included adult patients with COVID-19 from Xinhua Hospital, affiliated with Shanghai Jiao Tong University School of Medicine, between December 15, 2022, and February 15, 2023. The k-prototypes clustering method was employed using 50 clinical variables to identify phenotypes. Machine learning algorithms were then applied to select key classifier variables for phenotype recognition.
Results: A total of 1376 patients met the inclusion criteria. K-prototypes clustering revealed two distinct subphenotypes: Hypo-inflammatory subphenotype (824 [59.9%]) and Hyper-inflammatory subphenotype (552 [40.1%]). Patients in Hypo-inflammatory subphenotype were younger, predominantly female, with low mortality and shorter hospital stays. In contrast, Hyper-inflammatory subphenotype patients were older, predominantly male, exhibiting a hyperinflammatory state with higher mortality and rates of organ dysfunction. The AdaBoost model performed best for subphenotype prediction (Accuracy: 0.975, Precision: 0.968, Recall: 0.976, F1: 0.972, AUROC: 0.975). “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” were identified as the top-ranked variables in the AdaBoost model.
Conclusion: This analysis identified two phenotypes based on COVID-19 symptoms and comorbidities. These phenotypes can be accurately recognized using machine learning models, with the AdaBoost model being optimal for predicting in-hospital mortality. The variables “CRP”, “IL-2R”, “D-dimer”, “ST2”, “BUN”, “NT-proBNP”, “neutrophil percentage”, and “lymphocyte count” play a significant role in the prediction of subphenotypes. Use the identified subphenotypes for risk stratification in clinical practice. Hyper-inflammatory subphenotypes can be closely monitored, and preventive measures such as early admission to the intensive care unit or prophylactic anticoagulation can be taken.

Keywords: COVID-19, subphenotypes, K-prototypes clustering, machine learning, mortality prediction