中国全科医学 ›› 2026, Vol. 29 ›› Issue (23): 3294-3306.DOI: 10.12114/j.issn.1007-9572.2025.0358

• 论著 • 上一篇    下一篇

预测中国绝经后女性新发心脏代谢性共病风险的可解释机器学习模型研究

袁杭滔1, 洪妍2, 袁沛宏3, 林博1, 崔晓云4, 李伟炜5,*()   

  1. 1.100029 北京市,北京中医药大学第二临床医学院
    2.528200 广东省佛山市,广州中医药大学附属广东中西医结合医院
    3.301617 天津市,天津中医药大学
    4.100078 北京市,北京中医药大学东方医院心内科
    5.518000 广东省深圳市,南方医科大学深圳医院
  • 收稿日期:2025-08-12 修回日期:2025-12-30 出版日期:2026-08-15 发布日期:2026-07-03
  • 通讯作者: 李伟炜

  • 作者贡献:

    袁杭滔负责研究的思路构思、数据分析、绘制图表与文章撰写;洪妍进行数据的整理与协助撰写;袁沛宏负责文章撰写及语言润色;林博负责统计学计算及提出修改建议;崔晓云负责文章的思路指导与文章修改;李伟炜负责文章的思路指导,质量控制与审查,对文章整体负责,监督管理。

  • 基金资助:
    国家自然科学基金面上项目(81774044); 中央高水平中医医院临床科研业务费资助(DFGZRB-2024GJRCO17)

Interpretable Machine Learning Models for Predicting the Risk of Incident Cardiometabolic Multimorbidity in Chinese Postmenopausal Women

YUAN Hangtao1, HONG Yan2, YUAN Peihong3, LIN Bo1, CUI Xiaoyun4, LI Weiwei5,*()   

  1. 1. Second College of Clinical Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
    2. Affiliated Guangdong Hospital of Integrated Traditional Chinese and Western Medicine of Guangzhou University of Chinese Medicine, Foshan 528200, China
    3. Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
    4. Department of Cardiology, Dongfang Hospital Affiliated to Beijing University of Chinese Medicine, Beijing 100078, China
    5. Shenzhen Hospital of Southern Medical University, Shenzhen 518000, China
  • Received:2025-08-12 Revised:2025-12-30 Published:2026-08-15 Online:2026-07-03
  • Contact: LI Weiwei

摘要: 背景 心血管疾病是我国人群的常见疾病,心脏代谢性共病(CMM)是其中常见的共病类型,绝经后女性是心血管疾病的高发人群,但针对该人群CMM风险的预测模型尚缺少。 目的 基于中国健康与养老追踪调查(CHARLS)队列,开发可解释机器学习(ML)模型预测中国绝经后女性CMM风险。 方法 本研究纳入2011年参与CHARLS队列且未患有CMM的年龄≥45岁的绝经后女性,收集人口学特征、家庭特征、健康状况及实验室检查指标等特征资料,并于2013年、2015年、2018年、2020年进行随访和数据收集,观察CMM发生情况。通过最小绝对收缩和选择运算符(LASSO)算法方法进行特征选择,构建7种ML算法进行风险预测,对最优模型的测试集采用"class_weight='balanced’动态平衡权重+最优阈值筛选"优化模型并应用Shapley加法解释(SHAP)进行可视化分析;以受试者工作特征曲线下面积(AUC)、灵敏度、特异度、准确率和F1分数评估模型性能。 结果 共有5 575名参与者完成4轮随访并纳入此项研究,其中非CMM者4 363例,CMM者1 212例;中位随访9年,CMM累积发病率为21.74%。LASSO回归确定了22个关键特征作为CMM的重要预测因子:健康自评、精神疾病、关节炎、血脂异常、肾脏疾病、是否退休、收缩压(SBP)、舒张压(DBP)、平均脉搏、腰围、BMI、头痛、腰痛、血肌酐(Scr)、甘油三酯(TG)、C反应蛋白(CRP)、糖化血红蛋白(HbA1c)、尿酸(UA)、年龄、流调中心抑郁量表(CES-D)、吸烟、所属地区。各模型中,逻辑回归(LR)模型预测性能最优(测试集AUC=0.758,准确率为79.2%)。SHAP平均值条形图显示核心预测因子为:SBP、HbA1c、所属地区、腰围、CES-D、BMI、DBP、年龄。SHAP汇总图显示,SBP、HbA1c、腰围等与较高的CMM风险预测值相关。 结论 本研究针对中国绝经后女性的CMM风险开发了一个临床可视化预测模型,LR算法在其中表现出良好性能;SBP、HbA1c、腰围等特征是关键风险因素;该模型可为高危人群筛查和个体化干预提供循证依据。

关键词: 心脏代谢性共病, 绝经后女性, 中年人, 老年人, CHARLS, 预测模型

Abstract:

Background

Cardiovascular diseases are prevalent in China, with cardiometabolic multimorbidity (CMM) being a common comorbidity pattern. Postmenopausal women represent a high-risk group for cardiovascular diseases, yet there is a lack of predictive models for CMM risk specifically in this population.

Objective

To develop an interpretable machine learning (ML) model to predict the risk of CMM among Chinese postmenopausal women, based on data from the China Health and Retirement Longitudinal Study (CHARLS).

Methods

The study included postmenopausal women aged≥45 years from the CHARLS cohort in 2011 who were free of CMM at baseline. Data on demographic characteristics, family background, health status, and laboratory indicators were collected at baseline and during follow-up in 2013, 2015, 2018, and 2020 to observe CMM incidence. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) algorithm. Seven ML algorithms were constructed for risk prediction. The optimal model was further optimized on the test set using a combined strategy of "class_weight='balanced' dynamic weighting+optimal threshold selection" and visually interpreted using Shapley Additive Explanations (SHAP). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, precision, and F1-score.

Results

A total of 5 575 participants completed the 4 rounds of follow-up and were included, comprising 4 363 in the non-CMM group and 1 212 in the CMM group. Over a median follow-up of 9 years, the cumulative incidence of CMM was 21.74%. LASSO regression identified 22 key features as significant predictors of CMM: self-rated health, mental disorders, arthritis, dyslipidemia, kidney disease, retirement status, systolic blood pressure (SBP), diastolic blood pressure (DBP), mean pulse rate, waist circumference, BMI, headache, lower back pain, serum creatinine (Scr), triglycerides (TG), C-reactive protein (CRP), glycated hemoglobin (HbA1c), uric acid (UA), age, Center for Epidemiologic Studies Depression Scale (CES-D) score, smoking status, and geographic region. Among the models, the Logistic regression (LR) model demonstrated the best predictive performance (test set AUC=0.758, accuracy =79.2%). The SHAP mean bar plot revealed core predictors: SBP, HbA1c, geographic region, waist circumference, CES-D score, BMI, DBP, and age. The SHAP summary plot indicated that higher values of SBP, HbA1c, waist circumference, and others were associated with increased predicted CMM risk.

Conclusion

This study develops a clinically interpretable prediction model for CMM in Chinese postmenopausal women, with the LR algorithm showing favorable performance. Key risk factors include SBP, HbA1c, and waist circumference. The model provides an evidence-based tool for screening high-risk individuals and guiding personalized interventions.

Key words: Cardiometabolic multimorbidity, Postmenopausal women, Middle-aged, The elderly, CHARLS, Prediction model