已发表论文

基于 SHAP 值的聚类分析在动静脉瘘血液透析患者中的应用

 

Authors Shu P , Huang L, Wang X, Wen Z, Luo Y, Xu F

Received 6 May 2025

Accepted for publication 5 September 2025

Published 13 September 2025 Volume 2025:18 Pages 5475—5489

DOI https://doi.org/10.2147/IJGM.S533419

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Franco Musio

Peng Shu,* Ling Huang,* Xia Wang, Zhuping Wen, Yiqi Luo, Fang Xu

The Central Hospital of Wuhan,Tongji Medical College,Huazhong University of Science and Technology, Wuhan, Hubei Province, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Peng Shu; Fang Xu, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, No. 26, Shengli Street, Jiang’an District, Wuhan, Hubei Province, People’s Republic of China, Tel +8615802716692, Email 312855784@qq.com; 453328433@qq.com

Background: The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values.
Methods: A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions.
Results: The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from − 2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between − 3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions.
Conclusion: Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.

Keywords: hemodialysis, cluster analysis, SHAP values, unsupervised learning, personalized medicine