中国全科医学 ›› 2022, Vol. 25 ›› Issue (11): 1334-1339.DOI: 10.12114/j.issn.1007-9572.2022.0125

• 论著 • 上一篇    下一篇

三种风险预测模型预测钢铁工人颈动脉粥样硬化的效能比较

王娇娇, 陈圆煜, 郑子薇, 杨永忠, 陈哲, 李超, 王海东, 武建辉, 王国立*   

  1. 063210 河北省唐山市,华北理工大学公共卫生学院流行病与卫生统计学学科 河北省煤矿卫生与安全重点实验室
  • 收稿日期:2022-01-14 修回日期:2022-02-21 出版日期:2022-04-15 发布日期:2022-03-28
  • 通讯作者: 王国立
  • 基金资助:
    国家科技部重点研发项目(2016YFC0900605);河北省高等学校基本科研业务费项目(JYG2019002)

Comparison of Three Risk Prediction Models for Carotid Atherosclerosis in Steelworkers

WANG JiaojiaoCHEN YuanyuZHENG ZiweiYANG YongzhongCHEN ZheLI ChaoWANG HaidongWU JianhuiWANG Guoli*   

  1. Department of Epidemiology and Health StatisticsSchool of Public HealthNorth China University of Science and Technology/Hebei Provincial Key Laboratory of Coal Mine Health and SafetyTangshan 063210China

    *Corresponding authorWANG GuoliProfessorE-mail15383055966@163.com

  • Received:2022-01-14 Revised:2022-02-21 Published:2022-04-15 Online:2022-03-28

摘要: 背景颈动脉粥样硬化(CAS)不仅影响钢铁工人的工作效率,而且是引发缺血性脑血管疾病最重要的危险因素。近年来,越来越多的学者利用机器学习并通过易获得的因素对疾病进行风险预测,但目前,关于CAS风险预测模型的研究依然缺乏。目的运用支持向量机(SVM)、BP神经网络(BPNN)与随机森林(RF)模型构建钢铁工人CAS发生风险预测模型,并比较其预测效能。方法选取2017年3—6月在唐山市弘慈医院进行体检和健康监测的4 568例钢铁工人为研究对象,按照本团队编写的《健康评估检查表》进行调查,调查内容:人口学特征(性别、年龄、体质指数、文化程度、婚姻状况)、个人的行为生活习惯与方式(吸烟、饮酒)、个人病史(高血压、糖尿病、CAS家族史)、职业史(倒班、高温作业、噪声作业)。收集研究对象的实验室检查指标,如胆固醇、三酰甘油、同型半胱氨酸、尿酸。结合非条件多因素Logistic回归分析结果以及查阅相关文献,确定变量构建SVM、BPNN和RF模型并进行比较。结果训练集显示SVM、BPNN和RF模型预测钢铁工人发生CAS的准确率分别为83.81%、79.27%、86.60%,灵敏度分别为80.10%、66.19%、73.62%,特异度分别为87.32%、91.62%、98.90%,受试者工作特征曲线下面积(AUC)分别为0.84、0.79、0.86。SVM模型的灵敏度最高,RF模型在准确率、特异度和AUC方面高于其余两种模型,差异有统计学意义(P<0.05)。测试集显示SVM、BPNN和RF模型预测钢铁工人发生CAS的准确率分别为85.70%、75.46%、73.37%,灵敏度分别为81.63%、64.65%、60.00%,特异度分别为90.29%、87.66%、88.45%,AUC分别为0.86、0.76、0.74。SVM模型在灵敏度、准确率和AUC方面高于其余两种模型比较,差异有统计学意义(P<0.05)。结论运用SVM模型预测钢铁工人CAS发生风险的效果优于BPNN和RF模型。

关键词: 颈动脉疾病, 动脉粥样硬化, 金属工人, 支持向量机, BP神经网络, 随机森林, 预测

Abstract: Background

As a leading cause of ischemic cerebrovascular disease, carotid atherosclerosis (CAS) lowers the productivity of steelworkers. An increasing number of scholars have used machine learning to identify readily available factors to predict the risk of diseases. But there is still a lack of research on risk prediction models for CAS.

Objective

To compare the performance of support vector machine (SVM) -, BP neural network (BPNN) - and random forest (RF) -based models in predicting the risk of CAS in steelworkers.

Methods

4 568 steelworkers who underwent physical examination and health monitoring in Tangshan Hongci Hospital from March to June 2017 were selected for a survey using the Health Assessment Checklist developed by us for understanding their information about demographic characteristics (sex, age, BMI, education level, marital status) , personal behavior and lifestyle (smoking and drinking) , medical history (hypertension, diabetes, family history of CAS) , occupation history (current work in shifts, working under high temperature or in noisy environments) . Levels of serum cholesterol, triglyceride, homocysteine and uric acid were also collected. Variables for building SVM-, BPNN- and RF-based models for predicting the risk of CAS were determined using unconditioned multivariate Logistic regression analysis and literature review.

Results

In predicting the risk of CAS in participants in the training set, the accuracy, sensitivity and specificity were 83.81%, 80.10%, 87.32%, respectively, for the SVM-based model, 79.27%, 66.19%, 91.62%, respectively, for the BPNN-based model, and 86.60%, 73.62%, and 98.90%, respectively, for the RF-based model. And the AUC for SVM-, BPNN- and RF-based models was 0.84, 0.79 and 0.86, respectively. The SVM-based model had the highest sensitivity, while the RF-based model had the highest accuracy and specificity (P<0.05) . In predicting the risk of CAS in participants in the test set, the accuracy, sensitivity and specificity were 85.70%, 81.63%, 90.29%, respectively, for the SVM-based model, 75.46%, 64.65%, 87.66%, respectively, for the BPNN-based model, and 73.37%, 60.00%, and 88.45%, respectively, for the RF-based model. And the AUC for SVM-, BPNN- and RF-based models was 0.86, 0.76, and 0.74, respectively. The SVM-based model had the greatest accuracy, sensitivity and AUC. The sensitivity, accuracy and AUC of the SVM-based model were significantly different from those of the BPNN- or RF-based model in predicting the CAS risk (P<0.05) .

Conclusion

The SVM-based model may be better than other two models in predicting the risk of CAS in steelworkers.

Key words: Carotid artery diseases, Atherosclerosis, Metal workers, Support vector machine, Back propagation neural network, Random forest, Forecasting

中图分类号: