中国全科医学

• •

基于机器学习的心血管病高危人群颈动脉斑块风险预测模型构建

夏莹1,2,俞家杨1,2,郭申奥1,2,洪忻1*   

  1. 1.210003 江苏省南京市,南京医科大学附属南京疾病预防控制中心慢性非传染病防制科 2.211166 江苏省南京市,南京医科大学公共卫生学院
  • 收稿日期:2025-09-09 接受日期:2025-09-29
  • 通讯作者: 洪忻
  • 基金资助:
    南京市卫生科技发展专项资金项目(ZKX21054)

Construction of a carotid plaque risk prediction model based on machine learning among high-risk cardiovascular populations

  • Received:2025-09-09 Accepted:2025-09-29
  • Supported by:
    Nanjing Municipal Special Fund for Health Science and Technology Development(ZKX21054)
分享到

摘要: 背景 心血管病(CVD)是全球发病和死亡的首要原因,颈动脉斑块(CP)作为动脉粥样硬化的重要标志物可用于预测心血管事件,但传统超声筛查在大规模人群中受限,探索机器学习预测模型有助于实现CP的早期识别与CVD防控。目的 探索基于CVD高危人群构建CP风险的机器学习预测模型。方法 选取“江苏省心血管病高危人群早期筛查与综合干预项目”中南京市2023年9月至2025年8月的人群数据,按7:3比例随机划分训练集与测试集,以CP为结局,分别构建逻辑回归(LR)、类别提升(CatBoost)、极限梯度提升(XGBoost)、弹性网络(EN)、支持向量机(SVM)模型。基于测试集受试者工作特征曲线下面积(AUC)、准确度、灵敏度、特异度、F1分数综合评估不同模型效能,使用Shapley加法解释(SHAP)对最佳模型进行可解释性分析。结果 在5 666名研究对象中,斑块组3 639例(64.2%)。通过最小绝对收缩与选择算子(LASSO)回归筛选出21个非零系数关键变量,包括性别、地区、心电图印象、年龄、血尿酸、估计肾小球滤过率、糖化血红蛋白(HbA1c)、同型半胱氨酸(Hcy)、非高密度脂蛋白胆固醇、载脂蛋白B(APOB)、肌酸激酶同工酶(CK-MB)、舒张压、脉压差、尿酸碱度、尿比重、文化程度、当前职业、吸烟情况、饮酒情况、体力活动及睡眠质量。训练集中,XGBoost模型综合预测效能最优(AUC=0.747,准确度=0.693,灵敏度=0.732,特异度=0.623,F1分数=0.754)。基于SHAP的重要性排序结果依次为:年龄、Hcy、CK-MB、地区、HbA1c、ApoB和性别。结论 基于XGBoost机器学习算法构建的CVD高危人群CP风险预测模型性能最优,可为基层医疗卫生机构提供简便、低成本的工具,用于早期识别CP个体,并有助于在动脉粥样硬化早期阶段进行干预,从而降低CVD不良结局的发生风险。

关键词: 心血管病, 颈动脉斑块, 机器学习, 预测模型

Abstract: Background Cardiovascular disease (CVD) is the leading cause of morbidity and mortality worldwide. Carotid plaque (CP), as an important marker of atherosclerosis, can be used to predict cardiovascular events. However, conventional ultrasound screening is limited in large-scale populations. Exploring machine learning–based prediction models may facilitate the early identification of CP and the prevention and control of CVD. Objective To explore the construction of a machine learning prediction model for CP risk among high-risk CVD populations. Methods Data were obtained from the Nanjing arm of the “Jiangsu Province Early Screening and Comprehensive Intervention Program for High‑Risk Cardiovascular Populations” from September 2023 to August 2025. Participants were randomly split 7:3 into training and test sets. With CP as the outcome, we constructed logistic regression (LR), Categorical Boosting (CatBoost), extreme gradient boosting (XGBoost), elastic net (EN), and support vector machine (SVM) models. Model performance on the test set was evaluated using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score. Shapley additive explanations (SHAP) were applied to the best‑performing model for interpretability. Results Among the 5,666 participants, 3,639 (64.2%) were classified into the plaque group. Least absolute shrinkage and selection operator (LASSO) regression identified 21 key variables with non-zero coefficients, including sex, region, electrocardiogram findings, age, serum uric acid, estimated glomerular filtration rate, glycated hemoglobin (HbA1c), homocysteine (Hcy), non-high-density lipoprotein cholesterol, apolipoprotein B (ApoB), creatine kinase-MB (CK-MB), diastolic blood pressure, pulse pressure, urine pH, urine specific gravity, educational level, current occupation, smoking status, alcohol consumption, physical activity, and sleep quality. In the training set, the XGBoost model demonstrated the best overall predictive performance (AUC=0.747, accuracy=0.693, sensitivity=0.732, specificity=0.623, F1 score=0.754). According to SHAP importance ranking, the top predictors were age, Hcy, CK-MB, region, HbA1c, ApoB, and sex.Conclusions The CP risk prediction model based on the XGBoost machine learning algorithm showed the best performance among high-risk CVD populations. This model may provide a simple and low-cost tool for primary healthcare institutions to facilitate early identification of individuals with CP, support timely intervention at the early stage of atherosclerosis, and help reduce the risk of adverse CVD outcomes.

Key words: Cardiovascular disease, Carotid plaque, Machine learning, Prediction model