基于机器学习的心血管病高危人群颈动脉斑块风险预测模型构建

doi:10.12114/j.issn.1007-9572.2025.0312

摘要/Abstract

摘要： 背景　心血管病（CVD）是全球发病和死亡的首要原因，颈动脉斑块（CP）作为动脉粥样硬化的重要标志物可用于预测心血管事件，但传统超声筛查在大规模人群中受限，探索机器学习预测模型有助于实现CP的早期识别与CVD防控。目的　探索基于CVD高危人群构建CP风险的机器学习预测模型。方法　选取“江苏省心血管病高危人群早期筛查与综合干预项目”中南京市2023年9月至2025年8月的人群数据，按7：3比例随机划分训练集与测试集，以CP为结局，分别构建逻辑回归（LR）、类别提升（CatBoost）、极限梯度提升（XGBoost）、弹性网络（EN）、支持向量机（SVM）模型。基于测试集受试者工作特征曲线下面积（AUC）、准确度、灵敏度、特异度、F1分数综合评估不同模型效能，使用Shapley加法解释（SHAP）对最佳模型进行可解释性分析。结果　在5 666名研究对象中，斑块组3 639例（64.2%）。通过最小绝对收缩与选择算子（LASSO）回归筛选出21个非零系数关键变量，包括性别、地区、心电图印象、年龄、血尿酸、估计肾小球滤过率、糖化血红蛋白（HbA1c）、同型半胱氨酸（Hcy）、非高密度脂蛋白胆固醇、载脂蛋白B（APOB）、肌酸激酶同工酶（CK-MB）、舒张压、脉压差、尿酸碱度、尿比重、文化程度、当前职业、吸烟情况、饮酒情况、体力活动及睡眠质量。训练集中，XGBoost模型综合预测效能最优（AUC=0.747，准确度=0.693，灵敏度=0.732，特异度=0.623，F1分数=0.754）。基于SHAP的重要性排序结果依次为：年龄、Hcy、CK-MB、地区、HbA1c、ApoB和性别。结论　基于XGBoost机器学习算法构建的CVD高危人群CP风险预测模型性能最优，可为基层医疗卫生机构提供简便、低成本的工具，用于早期识别CP个体，并有助于在动脉粥样硬化早期阶段进行干预，从而降低CVD不良结局的发生风险。

关键词: 心血管病, 颈动脉斑块, 机器学习, 预测模型

Abstract: Background　Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Carotid plaque (CP), as an important marker of atherosclerosis, can be used to predict cardiovascular events. However, conventional ultrasound screening is limited in large-scale populations. Exploring machine learning–based prediction models may facilitate the early identification of CP and the prevention and control of CVD. Objective　To explore the construction of a machine learning prediction model for CP risk among high-risk CVD populations. Methods　Data were obtained from the Nanjing arm of the “Jiangsu Province Early Screening and Comprehensive Intervention Program for High‑Risk Cardiovascular Populations” from September 2023 to August 2025. Participants were randomly split 7:3 into training and test sets. With CP as the outcome, we constructed logistic regression (LR), Categorical Boosting (CatBoost), extreme gradient boosting (XGBoost), elastic net (EN), and support vector machine (SVM) models. Model performance on the test set was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score. Shapley additive explanations (SHAP) were applied to the best‑performing model for interpretability. Results　Among the 5,666 participants, 3,639 (64.2%) were classified into the plaque group. Least absolute shrinkage and selection operator (LASSO) regression identified 21 key variables with non-zero coefficients, including sex, region, electrocardiogram findings, age, serum uric acid, estimated glomerular filtration rate, glycated hemoglobin (HbA1c), homocysteine (Hcy), non-high-density lipoprotein cholesterol, apolipoprotein B (ApoB), creatine kinase-MB (CK-MB), diastolic blood pressure, pulse pressure, urine pH, urine specific gravity, educational level, current occupation, smoking status, alcohol consumption, physical activity, and sleep quality. In the training set, the XGBoost model demonstrated the best overall predictive performance (AUC=0.747, accuracy=0.693, sensitivity=0.732, specificity=0.623, F1 score=0.754). According to SHAP importance ranking, the top predictors were age, Hcy, CK-MB, region, HbA1c, ApoB, and sex.Conclusions　The CP risk prediction model based on the XGBoost machine learning algorithm showed the best performance among high-risk CVD populations. This model may provide a simple and low-cost tool for primary healthcare institutions to facilitate early identification of individuals with CP, support timely intervention at the early stage of atherosclerosis, and help reduce the risk of adverse CVD outcomes.

Key words: Cardiovascular disease, Carotid plaque, Machine learning, Prediction model

夏莹,俞家扬,郭申奥,等. 基于机器学习的心血管病高危人群颈动脉斑块风险预测模型构建[J]. 中国全科医学. DOI: 10.12114/j.issn.1007-9572.2025.0312.

[1]	贾高鹏, 陈秋雨. 老年急性ST段抬高型心肌梗死经皮冠状动脉介入治疗术后心绞痛复发风险预测模型构建和验证：基于CYP2C19相关基因检测[J]. 中国全科医学, 2025, 28(30): 3779-3786.
[2]	徐百川, 王艳, 张彭, 李艺婷, 刘飞来, 谢洋. 慢性阻塞性肺疾病共病肺癌筛查工具分析[J]. 中国全科医学, 2025, 28(30): 3847-3852.
[3]	李玲, 李雅萍, 钱时兴, 聂婧, 陆春华, 李霞. 社区中老年人认知功能影响因素及风险预测研究[J]. 中国全科医学, 2025, 28(30): 3773-3778.
[4]	刘银银, 隋鸿平, 李婷婷, 姜桐桐, 史铁英, 夏云龙. 乳腺癌治疗相关心脏毒性风险预测模型的研究进展[J]. 中国全科医学, 2025, 28(24): 3072-3078.
[5]	吴莎, 张代义, 李晋, 宣勤考, 钱晓东, 朱传武, 浦剑虹, 朱莉. 基于体检队列的代谢相关脂肪性肝病与高血糖关联及联合预测模型构建研究[J]. 中国全科医学, 2025, 28(23): 2861-2869.
[6]	周倩, 吴晓敏, 王宝华, 严若菡, 蔚苗, 吴静. 胃癌发生风险的列线图预测模型研究[J]. 中国全科医学, 2025, 28(23): 2870-2877.
[7]	赵晓晴, 郭桐桐, 张欣怡, 李林虹, 张亚, 嵇丽红, 董志伟, 高倩倩, 蔡伟芹, 郑文贵, 井淇. 社区老年人认知障碍风险预测模型的构建与验证研究[J]. 中国全科医学, 2025, 28(22): 2776-2783.
[8]	熊鑫, 李洋, 石峰, 杨连, 段维, 陈蓓, 李勇, 赵林伟, 付泉水, 范小萍, 杨国庆. 基于人工智能的胸腰椎骨密度测定系统及其校准研究[J]. 中国全科医学, 2025, 28(19): 2398-2406.
[9]	张冰清, 王忠凯, 吴长勇, 孙煌, 李锐洁, 刘文洁, 骆怡哗, 郑丽慧, 彭云珠. 1990—2021年全球先天性心脏缺陷疾病负担变化及未来趋势预测研究[J]. 中国全科医学, 2025, 28(18): 2253-2261.
[10]	绳菁煜, 刘凡凡, 马梅, 田霖, 刘雨桐, 刘凤敏, 高杉, 于春泉. 冠心病患者血尿素氮与血清白蛋白比值与颈动脉斑块的相关性研究[J]. 中国全科医学, 2025, 28(15): 1831-1839.
[11]	陈胜蓝, 郑永韬, 胡旺成, 倪作为, 夏冰, 叶春梅, 杜持新, 陈晓丹. 中小学生高度近视发生风险预测模型：基于巢式病例对照研究[J]. 中国全科医学, 2025, 28(09): 1115-1121.
[12]	石小天, 王珊, 杨华昱, 杨一帆, 李旭, 窦国泽, 马清. 基于血常规炎性指标构建衰弱/衰弱前期发生风险列线图模型研究[J]. 中国全科医学, 2025, 28(05): 587-593.
[13]	杜慧杰, 刘星雨, 徐明欢, 杨学智, 张慧琴, 莫佳丽, 卢依, 况杰. 急性缺血性脑卒中预后预测研究的应用进展：以机器学习预测模型为例[J]. 中国全科医学, 2025, 28(05): 554-560.
[14]	岳海涛, 何婵婵, 成羽攸, 张森诚, 吴悠, 马晶. 基于机器学习的冠心病风险预测模型构建与比较[J]. 中国全科医学, 2025, 28(04): 499-509.
[15]	张高钰, 王子涵, 高雪菲, 张瑾, 代天顾, 何清, 樊佳溶, 黄力, 李琳. 基于三酰甘油葡萄糖指数联合血管弹性指标的绝经后女性高血压患者冠心病发生风险模型开发研究[J]. 中国全科医学, 2025, 28(01): 39-46.

基于机器学习的心血管病高危人群颈动脉斑块风险预测模型构建

Construction of a carotid plaque risk prediction model based on machine learning among high-risk cardiovascular populations

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

留言