使用机器学习建立慢性阻塞性肺疾病患者重度气流受限风险预警模型研究

doi:10.12114/j.issn.1007-9572.2021.01.313

中国全科医学 ›› 2022, Vol. 25 ›› Issue (02): 217-226.DOI: 10.12114/j.issn.1007-9572.2021.01.313

所属专题：呼吸疾病文章合集

使用机器学习建立慢性阻塞性肺疾病患者重度气流受限风险预警模型研究

周丽娟^1,³, 温贤秀^2,^*, 吕琴¹, 蒋蓉², 吴行伟^4,⁵, 周黄源³, 向超¹

1.610072　四川省成都市，电子科技大学附属医院·四川省人民医院呼吸与危重症医学科
2.610072　四川省成都市，电子科技大学附属医院·四川省人民医院护理部
3.610072　四川省成都市，电子科技大学医学院
4.610072　四川省成都市，电子科技大学附属医院·四川省人民医院药学部
5.610072　四川省成都市，电子科技大学医学院，个体化药物治疗四川省重点实验室

收稿日期:2021-06-09 修回日期:2021-11-04 出版日期:2022-01-15 发布日期:2021-12-29
通讯作者: 温贤秀
基金资助:
国家自然科学基金资助项目(72004020);干部保健科研课题川干研(2021-219)

Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease

ZHOU Lijuan1，3，WEN Xianxiu2*，LYU Qin1，JIANG Rong2，WU Xingwei4，5，ZHOU Huangyuan3，XIANG Chao1

1.Department of Respiratory and Critical Care Medicine，University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital，Chengdu 610072，China
2.Department of Nursing，University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital，Chengdu 610072，China
3.University of Electronic Science and Technology of China，Chengdu 610072，China
4.Department of Pharmacy，University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital，Chengdu 610072，China
5.Personalized Drug Therapy Key Laboratory of Sichuan Province，School of Medicine，University of Electronic Science and Technology of China，Chengdu 610072，China
*Corresponding author：WEN Xianxiu，Professor of nursing ；E-mail：392083173@qq.com

Received:2021-06-09 Revised:2021-11-04 Published:2022-01-15 Online:2021-12-29

分享到

摘要/Abstract

摘要： 背景气流受限程度是评价慢性阻塞性肺疾病（COPD）患者疾病进展的关键指标。然而由于检查禁忌、依从性等问题，导致部分患者难以开展相关检查，无法评价疾病严重程度。目的建立并评估基于机器学习算法的COPD患者重度气流受限风险预警模型。方法采用横断面设计调查2019年1月至2020年6月四川省某三甲医院的COPD住院患者，收集患者一般临床指标与肺功能检查数据。将数据按8∶2比例随机分为训练集和测试集，在训练集中使用4种缺失值填充方法、3种特征筛选方法、17种机器学习和1种集成学习算法构建216种风险预警模型。采用ROC曲线下面积（AUC）、准确率、精确率、召回率和F1值评价模型的预测性能，分别使用十折交叉验证法和Bootstrapping算法进行内部验证和外部验证。使用测试集数据进行模型测试和选择。使用后验法进行样本量验证。结果共纳入418例患者，其中212例（50.7%）患者存在重度以上气流受限风险。经4种缺失值处理和3种特征筛选后，共获得12个处理后的数据集及12种影响气流受限因素的重要性排序，结果显示，呼吸困难指数评分（mMRC）等级、年龄、体质指数（BMI）、吸烟史（有、无）、慢性阻塞性肺疾病评估表（CAT）评分、呼吸困难（有、无）在变量特征排序中居于前列，是构造模型的关键指标，对结果预测有重要作用。其中，采取不填充、Lasso筛选方法后，mMRC等级、吸烟史（有、无）、呼吸困难（有、无）为位居前3位的预测因子，mMRC等级占特征重要性的54.15%。使用不填充、Boruta筛选方法后，CAT评分、年龄、mMRC等级为位居前3位的预测因子，CAT评分占特征重要性的26.64%。使用17种机器学习和1个集成学习算法对12个数据集分别建模，共得216个预测模型。17种机器学习算法十折交叉验证结果显示，不同算法预测性能比较，差异有统计学意义（P<0.05），随机梯度下降算法的平均AUC最大，为（0.738±0.089）。使用Bootstrapping算法对测试集进行外部验证结果显示，不同算法所得模型的预测性能比较，差异有统计学意义（P<0.05），集成学习算法的平均AUC最大，为（0.757±0.057）。利用Bootstrapping算法对4种缺失值处理和3种特征筛选预测性能评价结果显示，当不填充和Lasso筛选时，可提高模型的性能，差异有统计学意义（P<0.05）。使用测试集数据对216个机器学习模型进行测试，最佳模型的AUC为0.790 9，准确率为75.90%，精确率为75.00%，召回率为78.57%，F1值为0.767 4。样本量验证结果提示研究样本量可满足建模需求。结论本研究建立并评价了COPD患者重度气流受限风险预警模型，mMRC等级、年龄、BMI、CAT评分、是否有吸烟史和呼吸困难是影响气流受限的关键指标。该模型预测效果良好，具有潜在的临床应用前景。

关键词: 肺疾病, 慢性阻塞性, 机器学习, 气流受限程度, 肺功能, 呼吸功能试验, 预测模型

Abstract: Background

The degree of airflow limitation is a key indicator of the progression degree in COPD patients. However, problems such as contraindications to testing and compliance make it difficult for some patients to undergo the relevant tests and evaluate the severity of the disease.

Objective

To develop and evaluate a machine learning algorithm-based early warning model for the risk of severe airflow limitation in COPD patients.

Methods

A cross-sectional design was used to investigate COPD inpatients in a tertiary hospital in Sichuan Province from 2019-01 to 2020-06. General clinical indexes and pulmonary function test data were collected. The data were randomly divided into training and test sets in the ratio of 8∶2, and 216 risk warning models were constructed in the training set using four missing value filling methods, three feature screening methods, 17 machine learning and one integrated learning algorithm. The area under the ROC curve (AUC) , accuracy, precision, recall and F1 score were used to evaluate the predictive performance of the model; and the ten-fold cross-validation method and Bootstrapping were used for internal and external validation, respectively. The test set data was used for model testing and selection, the posterior method was used for sample size verification.

Results

A total of 418 patients were included, of which 212 (50.7%) patients were at risk of severe airflow limitation. After four missing value treatments and three feature filters, a total of 12 processed datasets and the importance ranking of 12 factors affecting airflow limitation were obtained, and the results showed that modified medical research council dyspnea scale grade (mMRC) , age, body mass index (BMI) , smoking history (yes, no) , chronic obstructive pulmonary disease assessment test (CAT) score, and dyspnea (yes, no) were at the forefront inthe ranking of variable features and were key indicators for constructing the model, which had an important role in predicting the outcome. Using unfilled, Lasso screening, mMRC grade, smoking history (yes, no) , and dyspnea (yes, no) were the top 3 predictors, with mMRC grade accounting for 54.15% of feature importance. In which, using unfilled, Boruta screening, CAT score, age, and mMRC class were the top 3 predictors, and CAT score accounted for 26.64% of feature importance. A total of 216 prediction models were obtained using 17 machine learning algorithms and 1 integrated learning for each of the 12 datasets. 17 machine learning algorithms with 10-fold cross-validation showed that the differences were statistically significant (P<0.05) when comparing the prediction performance of different algorithms, and the average AUC of the stochastic gradient descent algorithm was maximum (0.738±0.089) . The results of external validation of the test set using the Bootstrapping algorithm showed that the differences were statistically significant (P<0.05) when comparing the prediction performance of the models obtained by different algorithms, and the average AUC of the integrated learning algorithm was maximum (0.757±0.057) . Evaluation of the prediction performance of four missing value treatments and three feature filters using the Bootstrapping algorithm showed that the performance of the model was improved when no padding and Lasso filtering were applied, with a statistically significant difference (P<0.05) . Using the test set data for 216 machine learning models, the best model had an AUC of 0.790 9, accuracy of 75.90%, precision of 75.00%, recall of 78.57%, and F1 value of 0.767 4. The sample size validation results suggested that the study sample size can meet the modeling needs.

Conclusion

In this study, a risk warning model for severe airflow limitation in COPD patients was developed and evaluated. mMRC class, age, BMI, CAT score, presence of smoking history and dyspnea were the key indicators affecting airflow limitation. The model has good predictive effect and has potential clinical application.

Key words: Pulmonary disease, chronic obstructive, Machine learning, Degree of airflow limitation, Lung function, Respiratory function tests, Prediction model

中图分类号:

R563.9

周丽娟,温贤秀,吕琴,等. 使用机器学习建立慢性阻塞性肺疾病患者重度气流受限风险预警模型研究[J]. 中国全科医学, 2022, 25(02): 217-226. DOI: 10.12114/j.issn.1007-9572.2021.01.313.

ZHOU Lijuan, WEN Xianxiu, LYU Qin, JIANG Rong, WU Xingwei, ZHOU Huangyuan, XIANG Chao.

Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease [J]. Chinese General Practice, 2022, 25(02): 217-226.

图/表 13

参考文献 23

[1]	GBD Chronic Respiratory Disease Collaborators. Prevalence and attributable health burden of chronic respiratory diseases，1990-2017：a systematic analysis for the Global Burden of Disease Study 2017[J]. Lancet Respir Med，2020，8（6）：585-596.
[2]	LOZANO R，NAGHAVI M，FOREMAN K，et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010：a systematic analysis for the Global Burden of Disease Study 2010[J]. Lancet，2012，380（9859）：2095-2128.
[3]	国家卫生计生委疾病预防控制局. 中国居民营养与慢性病状况报告（2015年）[M]. 北京：人民卫生出版社，2015.
[4]	宋元林，郑劲平. 有关《常规肺功能检查基层指南（2018年）》的几点说明[J]. 中华全科医师杂志，2019，18（6）：505-506. DOI：10.3760/cma.j.issn.1671-7368.2019.06.001.
[5]	SINGH D，AGUSTI A，ANZUETO A，et al. Global strategy for the diagnosis，management，and prevention of chronic obstructive lung disease：the GOLD science committee report 2019[J]. Eur Respir J，2019，53（5）：1900164. DOI：10.1183/13993003.00164-2019.
[6]	周钰焱，金志贤，刘翱，等. 慢阻肺患者临床症状评估方法比较及进展[J]. 临床肺科杂志，2019，24（12）：2284-2287，2294. DOI：10.3969/j.issn.1009-6663.2019.12.035.
[7]	兰丰铃，李嘉琛，余灿清，等. 中国成年人气流受限与慢性病主要死亡风险的前瞻性研究[J]. 中华流行病学杂志，2017，38（1）：13-19. DOI：10.3760/cma.j.issn.0254-6450.2017.01.003.
[8]	LIU B，LI K，HUANG D S，et al. iEnhancer-EL：identifying enhancers and their strength with ensemble learning approach[J]. Bioinformatics，2018，34（22）：3835-3842.
[9]	SAGI O，ROKACH L. Ensemble learning：a survey[J].Wiley Interdiscip Rev：Data Min Knowl Discov，2018，8（4）：e1249.
[10]	董泉明，宋天然，姜晨宇，等. FEV1多元线性回归模型在肺功能测试中的应用[J]. 南方医科大学学报，2020，40（12）：1799-1803. DOI：10.12122/j.issn.1673-4254.2020.12.15.
[11]	ZAFARI Z，SIN D D，POSTMA D S，et al. Individualized prediction of lung-function decline in chronic obstructive pulmonary disease[J]. CMAJ，2016，188（14）：1004-1011.
[12]	苏建华，车国卫. 肺癌患者术前肺功能评定的现状与进展[J].中国肿瘤临床，2017，44（7）：301-305. DOI：10.3969/j.issn.1000-8179.2017.07.730.
[13]	郭志斌，李宣广，陈军. 肺癌患者术前肺功能评估研究进展[J]. 社区医学杂志，2019，17（7）：431-434.
[14]	祁卉卉，陆燕，刘晓东，等. 上海市老年人肺通气功能检查正常参考值的初步研究[J]. 临床肺科杂志，2018，23（7）：1236-1239. DOI：10.3969/j.issn.1009-6663.2018.07.020.
[15]	胡银霞，张丽，范锦秀. 老年慢性阻塞性肺疾病病人肺功能指标变化以及危险因素分析[J]. 实用老年医学，2020，34（9）：934-936. DOI：10.3969/j.issn.1003-9198.2020.09.019.
[16]	SALVI S S，BRASHIER B B，LONDHE J，et al. Phenotypic comparison between smoking and non-smoking chronic obstructive pulmonary disease[J].Respir Res，2020，21（1）：50.
[17]	王辉，叶彩虹，马焕丽，等. 吸烟介导的COPD呼吸道微生态失调对Treg/Th17失衡的影响[J]. 分子诊断与治疗杂志，2021，13（3）：437-440，444. DOI：10.19930/j.cnki.jmdt.2021.03.025.
[18]	KÖCHLI S，ENDES K，BARTENSTEIN T，et al. Lung function，obesity and physical fitness in young children：The EXAMIN YOUTH study[J]. Respir Med，2019，159：105813.
[19]	ZHU J，ZHAO Z，WU B，et al. Effect of body mass index on lung function in Chinese patients with chronic obstructive pulmonary disease：a multicenter cross-sectional study[J]. Int J Chron Obstruct Pulmon Dis，2020，15：2477-2486. DOI：10.2147/COPD.S265676.eCollection2020.
[20]	GRIGSBY M R，SIDDHARTHAN T，POLLARD S L，et al. Low body mass index is associated with higher odds of COPD and lower lung function in low- and middle-income countries[J]. COPD，2019，16（1）：58-65. DOI：10.1080/15412555.2019.1589443.
[21]	GUPTA N，PINTO L M，MOROGAN A，et al. The COPD assessment test：a systematic review[J]. Eur Respir J，2014，44（4）：873-884. DOI：10.1183/09031936.00025214.
[22]	PASQUALE M K，XU Y，BAKER C L，et al. COPD exacerbations associated with the modified Medical Research Council scale and COPD assessment test among Humana Medicare members[J]. Int J Chron Obstruct Pulmon Dis，2016，11：111-121. DOI：10.2147/COPD.S94323.
[23]	STEKHOVEN D J，BÜHLMANN P. MissForest—non-parametric missing value imputation for mixed-type data[J]. Bioinformatics，2012，28（1）：112-118. DOI：10.1093/bioinformatics/btr597.

变量		数据	变量		数据
年龄（±s，岁）		63.7±10.9	距上次急性发作门诊就诊天数^a（±s，d）		0.6±1.8
性别〔n（%）〕			全身激素使用〔n（%）〕
	女	46 （11.0）		无	403 （96.4）
	男	372 （89.0）		有	15 （3.6）
病程分期〔n（%）〕			合并肺心病〔n（%）〕
	稳定期	304（72.7）		无	407 （97.4）
	急性加重期	114（27.3）		有	11 （2.6）
BMI（±s，kg/m²）		23.1±3.6	营养代谢异常〔n（%）〕
受教育程度^a〔n（%）〕				无	416 （99.5）
	文盲	25 （6.0）		有	2 （0.5）
	小学	150 （36.0）	心血管疾病〔n（%）〕
	初中	145 （34.8）		无	408 （97.6）
	高中/中专	55 （13.2）		有	10 （2.4）
	大专及以上	42 （10.0）	其他疾病史〔n（%）〕
哮喘症状〔n（%）〕				无	300 （71.8）
	无	79 （18.9）		有	118 （28.2）
	有	339 （81.1）	COPD家族史〔n（%）〕
喘息〔n（%）〕				无	260 （62.2）
	无	82 （19.6）		有	158 （37.8）
	有	336 （80.4）	吸烟史〔n（%）〕
呼吸困难〔n（%）〕				无	91 （21.8）
	无	62 （14.8）		有	327 （78.2）
	有	356 （85.2）	吸氧〔n（%）〕
mMRC等级^a〔n（%）〕				无	389 （93.1）
	0级	25 （6.0）		有	29 （6.9）
	1级	145 （34.8）	使用经皮血氧饱和度监测仪〔n（%）〕
	2级	178 （42.7）		无	413 （98.8）
	3级	68 （16.3）		有	5 （1.2）
	4级	1 （0.2）	锻炼〔n（%）〕
食欲不振〔n（%）〕				无	109 （26.1）
	无	358 （85.6）		有	309 （73.9）
	有	60 （14.4）	缩唇腹式呼吸〔n（%）〕
咳嗽〔n（%）〕				无	256 （61.2）
	无	71 （17.0）		有	162 （38.8）
	有	347 （83.0）	CAT评分（±s，分）		12.8±5.6
急性发作次数（±s，次）		1.4±1.5	使用吸入剂〔n（%）〕
距上次急性发作的天数（±s，d）		1.4±31.7		无	47 （11.2）
致病因素^b〔n（%）〕				有	371 （88.8）
	不清楚	139 （33.4）	长期使用吸入药物〔n（%）〕
	感冒	244 （58.7）		无	56 （13.4）
	冷空气	8 （1.9）		有	361 （86.6）
	其他	8 （1.9）	肺功能中FEV₁%〔n（%）〕
	运动	6 （1.4）		≥50%	206（49.3）
	刺激性气体	11 （2.7）		<50%	212（50.7）
急性发作住院次数（±s，次）		0.6±1.1

变量		数据	变量		数据
年龄（±s，岁）		63.7±10.9	距上次急性发作门诊就诊天数^a（±s，d）		0.6±1.8
性别〔n（%）〕			全身激素使用〔n（%）〕
	女	46 （11.0）		无	403 （96.4）
	男	372 （89.0）		有	15 （3.6）
病程分期〔n（%）〕			合并肺心病〔n（%）〕
	稳定期	304（72.7）		无	407 （97.4）
	急性加重期	114（27.3）		有	11 （2.6）
BMI（±s，kg/m²）		23.1±3.6	营养代谢异常〔n（%）〕
受教育程度^a〔n（%）〕				无	416 （99.5）
	文盲	25 （6.0）		有	2 （0.5）
	小学	150 （36.0）	心血管疾病〔n（%）〕
	初中	145 （34.8）		无	408 （97.6）
	高中/中专	55 （13.2）		有	10 （2.4）
	大专及以上	42 （10.0）	其他疾病史〔n（%）〕
哮喘症状〔n（%）〕				无	300 （71.8）
	无	79 （18.9）		有	118 （28.2）
	有	339 （81.1）	COPD家族史〔n（%）〕
喘息〔n（%）〕				无	260 （62.2）
	无	82 （19.6）		有	158 （37.8）
	有	336 （80.4）	吸烟史〔n（%）〕
呼吸困难〔n（%）〕				无	91 （21.8）
	无	62 （14.8）		有	327 （78.2）
	有	356 （85.2）	吸氧〔n（%）〕
mMRC等级^a〔n（%）〕				无	389 （93.1）
	0级	25 （6.0）		有	29 （6.9）
	1级	145 （34.8）	使用经皮血氧饱和度监测仪〔n（%）〕
	2级	178 （42.7）		无	413 （98.8）
	3级	68 （16.3）		有	5 （1.2）
	4级	1 （0.2）	锻炼〔n（%）〕
食欲不振〔n（%）〕				无	109 （26.1）
	无	358 （85.6）		有	309 （73.9）
	有	60 （14.4）	缩唇腹式呼吸〔n（%）〕
咳嗽〔n（%）〕				无	256 （61.2）
	无	71 （17.0）		有	162 （38.8）
	有	347 （83.0）	CAT评分（±s，分）		12.8±5.6
急性发作次数（±s，次）		1.4±1.5	使用吸入剂〔n（%）〕
距上次急性发作的天数（±s，d）		1.4±31.7		无	47 （11.2）
致病因素^b〔n（%）〕				有	371 （88.8）
	不清楚	139 （33.4）	长期使用吸入药物〔n（%）〕
	感冒	244 （58.7）		无	56 （13.4）
	冷空气	8 （1.9）		有	361 （86.6）
	其他	8 （1.9）	肺功能中FEV₁%〔n（%）〕
	运动	6 （1.4）		≥50%	206（49.3）
	刺激性气体	11 （2.7）		<50%	212（50.7）
急性发作住院次数（±s，次）		0.6±1.1

变量名	初筛剔除原因	变量名	初筛剔除原因
吸氧（有、无）	②	营养状况	③
每日吸氧时间	②	血氧饱和度值	③
吸氧流量	②	无创通气使用（有、无）	②
吸氧方式	②	每天无创通气时间	③
无创通气方式	②	佩戴面罩（有、无）	②
是否知晓无创呼吸机湿化罐和呼吸机管道如何消毒	②	使用经皮血氧饱和度监测仪（有、无）	②

变量名	初筛剔除原因	变量名	初筛剔除原因
吸氧（有、无）	②	营养状况	③
每日吸氧时间	②	血氧饱和度值	③
吸氧流量	②	无创通气使用（有、无）	②
吸氧方式	②	每天无创通气时间	③
无创通气方式	②	佩戴面罩（有、无）	②
是否知晓无创呼吸机湿化罐和呼吸机管道如何消毒	②	使用经皮血氧饱和度监测仪（有、无）	②

机器学习算法	AUC		准确率		精确率		召回率		F1值
机器学习算法	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI
AdaBoost	0.706±0.098	（0.689，0.724）	0.683±0.076	（0.670，0.697）	0.672±0.073	（0.659，0.685）	0.740±0.106	（0.721，0.759）	0.701±0.079	（0.687，0.715）
Bagging	0.665±0.100	（0.647，0.683）	0.626±0.087	（0.611，0.642）	0.642±0.096	（0.624，0.659）	0.614±0.121	（0.592，0.636）	0.622±0.091	（0.605，0.638）
Bernoulli Naive Bayes	0.715±0.074	（0.702，0.729）	0.664±0.061	（0.653，0.675）	0.659±0.067	（0.647，0.672）	0.709±0.093	（0.693，0.726）	0.680±0.063	（0.668，0.691）
Decision Tree	0.694±0.074	（0.680，0.707）	0.685±0.071	（0.672，0.697）	0.667±0.065	（0.656，0.679）	0.758±0.115	（0.737，0.779）	0.706±0.076	（0.692，0.720）
Extra Tree	0.678±0.080	（0.663，0.692）	0.664±0.071	（0.651，0.676）	0.664±0.074	（0.651，0.678）	0.694±0.110	（0.674，0.714）	0.674±0.074	（0.661，0.687）
Gaussian Naive Bayes	0.702±0.084	（0.687，0.717）	0.639±0.067	（0.627，0.651）	0.616±0.058	（0.605，0.626）	0.777±0.079	（0.763，0.791）	0.685±0.058	（0.675，0.696）
Gradient Boosting	0.700±0.097	（0.682，0.717）	0.664±0.082	（0.649，0.678）	0.662±0.079	（0.647，0.676）	0.695±0.131	（0.671，0.719）	0.673±0.091	（0.656，0.689）
KNN	0.697±0.088	（0.681，0.713）	0.637±0.082	（0.622，0.652）	0.618±0.076	（0.605，0.632）	0.781±0.120	（0.760，0.803）	0.684±0.072	（0.671，0.697）
LDA	0.729±0.091	（0.712，0.745）	0.677±0.072	（0.664，0.690）	0.676±0.075	（0.662，0.689）	0.704±0.111	（0.684，0.724）	0.685±0.077	（0.671，0.699）
Logistic Regression	0.728±0.094	（0.711，0.745）	0.682±0.074	（0.669，0.696）	0.683±0.077	（0.669，0.697）	0.701±0.117	（0.680，0.722）	0.687±0.084	（0.672，0.703）
Multinomial Naive Bayes	0.640±0.100	（0.622，0.659）	0.596±0.089	（0.580，0.612）	0.590±0.087	（0.574，0.606）	0.697±0.141	（0.672，0.723）	0.632±0.089	（0.616，0.648）
Passive Aggressive	0.649±0.113	（0.628，0.669）	0.601±0.090	（0.584，0.617）	0.603±0.102	（0.585，0.622）	0.639±0.184	（0.606，0.672）	0.607±0.124	（0.584，0.629）
QDA	0.719±0.089	（0.703，0.735）	0.661±0.076	（0.647，0.674）	0.650±0.074	（0.637，0.664）	0.723±0.114	（0.703，0.744）	0.681±0.078	（0.667，0.695）
Random Forest	0.664±0.110	（0.644，0.684）	0.625±0.099	（0.607，0.643）	0.636±0.107	（0.616，0.655）	0.620±0.131	（0.597，0.644）	0.623±0.108	（0.603，0.642）
SGD	0.738±0.089	（0.722，0.755）	0.685±0.075	（0.672，0.699）	0.684±0.077	（0.670，0.698）	0.716±0.110	（0.696，0.736）	0.695±0.077	（0.681，0.709）
SVM	0.720±0.101	（0.701，0.738）	0.666±0.087	（0.651，0.682）	0.678±0.098	（0.660，0.695）	0.666±0.112	（0.645，0.686）	0.667±0.090	（0.651，0.683）
XGBoost	0.677±0.099	（0.659，0.695）	0.637±0.079	（0.622，0.651）	0.642±0.078	（0.628，0.656）	0.642±0.124	（0.620，0.665）	0.637±0.090	（0.621，0.654）
P值	<0.000 1		<0.000 1		<0.000 1		<0.000 1		<0.000 1

使用机器学习建立慢性阻塞性肺疾病患者重度气流受限风险预警模型研究

Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 23

相关文章 15

编辑推荐

Metrics

留言

处理方法	AUC		准确率		精确率		召回率		F1值
处理方法	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI
Not	0.724±0.070	（0.723，0.725）	0.689±0.070	（0.688，0.691）	0.676±0.083	（0.674，0.677）	0.751±0.098	（0.749，0.753）	0.707±0.073	（0.705，0.708）
Random Forest	0.682±0.068	（0.681，0.684）	0.640±0.063	（0.638，0.641）	0.630±0.074	（0.628，0.631）	0.722±0.101	（0.720，0.724）	0.668±0.072	（0.667，0.669）
Random Forest Improve	0.681±0.069	（0.680，0.683）	0.642±0.063	（0.640，0.643）	0.632±0.076	（0.631，0.634）	0.720±0.101	（0.718，0.722）	0.669±0.073	（0.667，0.670）
Simple	0.679±0.068	（0.677，0.680）	0.642±0.064	（0.641，0.644）	0.634±0.079	（0.633，0.636）	0.720±0.104	（0.718，0.722）	0.669±0.073	（0.668，0.671）
P值	<0.000 1		<0.000 1		<0.000 1		<0.000 1		<0.000 1

筛选方法	AUC		准确率		精确率		召回率		F1Score
筛选方法	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI
Boruta	0.681±0.072	（0.680，0.682）	0.652±0.068	（0.650，0.653）	0.643±0.081	（0.641，0.644）	0.722±0.100	（0.721，0.724）	0.676±0.073	（0.674，0.677）
Lasso	0.703±0.069	（0.701，0.704）	0.651±0.069	（0.649，0.652）	0.643±0.082	（0.642，0.644）	0.717±0.110	（0.715，0.719）	0.672±0.079	（0.671，0.674）
Not	0.691±0.071	（0.690，0.692）	0.658±0.068	（0.656，0.659）	0.643±0.078	（0.642，0.645）	0.745±0.094	（0.743，0.746）	0.687±0.071	（0.686，0.688）
Pvalue	<0.000 1		<0.000 1		0.534 4		<0.000 1		<0.000 1

气流受限程度风险预警模型	模型类型	填充方式	筛选方式	变量个数	AUC	准确率	精确率	召回率	F1值
model 1	集成学习	Not	Not	23	0.790 9	0.759 0	0.750 0	0.785 7	0.767 4
model 2	集成学习	Not	Boruta	16	0.787 5	0.759 0	0.739 1	0.809 5	0.772 7
model 3	逻辑回归	Not	Not	23	0.776 4	0.747 0	0.723 4	0.809 5	0.764 0
model 4	自适应增强	Not	Lasso	4	0.773 8	0.698 8	0.680 9	0.761 9	0.719 1
model 5	集成学习	Not	Lasso	4	0.773 8	0.698 8	0.680 9	0.761 9	0.719 1

[1]	王珍, 申国旗, 李亚南, 朱英华, 仇航, 郑迪, 徐通达, 李文华. 急性心肌梗死患者行经皮冠状动脉介入治疗术后发生对比剂急性肾损伤风险预测模型的建立与验证研究[J]. 中国全科医学, 2023, 26(29): 3650-3656.
[2]	刘健, 张天一, 艾力扎提·艾则孜, 常蕊静, 张建立, 王婉, 姜鹏. 外科口罩和N95口罩对慢性阻塞性肺疾病患者心肺功能的影响：一项随机交叉对照试验[J]. 中国全科医学, 2023, 26(24): 3028-3032.
[3]	冯佳, 王洁, 余丹, 刘永恒, 赵伟栋, 田宏远. 2010—2021年国内外老年多重慢病研究热点分析[J]. 中国全科医学, 2023, 26(21): 2574-2580.
[4]	李静波, 庞高峰, 任艳玲, 沙曦雪, 倪慧萍. 支气管哮喘儿童GO/NOGO范式实验执行功能及其与肺功能的相关性分析[J]. 中国全科医学, 2023, 26(20): 2503-2507.
[5]	沈俊希, 朱星, 陈云志, 李文. 肺部、肠道菌群及其相互作用与慢性阻塞性肺疾病发生发展的研究进展[J]. 中国全科医学, 2023, 26(20): 2548-2554.
[6]	郭天赐, 陈继鑫, 余伟杰, 刘爱峰. 人工智能在骨关节炎诊疗中的应用进展[J]. 中国全科医学, 2023, 26(19): 2428-2433.
[7]	沈傲梅, 路潜, 符鑫, 韦小夏, 卞静如, 张丽媛, 强万敏, 庞冬. 基于前瞻性队列研究的Meta分析构建乳腺癌相关淋巴水肿风险预测模型研究[J]. 中国全科医学, 2023, 26(17): 2078-2088.
[8]	梁振宇, 王凤燕, 陈子正, 陈荣昌. 2023年GOLD慢性阻塞性肺疾病诊断、管理及预防全球策略更新要点解读[J]. 中国全科医学, 2023, 26(11): 1287-1298.
[9]	王通, 权海善, 田博文, 李莹, 崔倩倩, 刘瑶, 朱花花. 慢性阻塞性肺疾病患者疲劳研究的范围综述[J]. 中国全科医学, 2023, 26(07): 893-902.
[10]	刘建材, 郑涵尹, 潘卉, 叶灵兰, 李传芬. 农村地区基层慢性病管理人员对慢性阻塞性肺疾病认知的调查研究[J]. 中国全科医学, 2023, 26(07): 877-885.
[11]	王益德, 李争, 李风森. 从脂肪组织的内分泌功能角度探讨其在慢性阻塞性肺疾病中的作用机制研究[J]. 中国全科医学, 2023, 26(06): 754-759.
[12]	胡奕卿, 方继伟, 刘焕兵. 肺功能检查技术如何在基层医疗卫生服务中更好地应用——附重点问题专家解答[J]. 中国全科医学, 2023, 26(05): 532-540.
[13]	石伟娟, 王凤燕, 杨宇琼, 谢清秀, 李玉琪, 李时悦, 陈荣昌, 张冬莹, 郑劲平, 梁振宇. 新型冠状病毒感染疫情对慢性阻塞性肺疾病患者急性加重频率的影响研究[J]. 中国全科医学, 2023, 26(05): 550-556.
[14]	白亚虎, 高胜寒, 纪思禹, 尚金钰, 董延春, 宁康. 慢性阻塞性肺疾病向"前"发展[J]. 中国全科医学, 2023, 26(03): 268-273.
[15]	郭栋伟, 张鹏飞, 任明君, 廖丽君, 黄茹妍, 罗湘蓉. 银杏叶提取物防治慢性阻塞性肺疾病的机制研究：基于PI3K/Akt/mTOR信号通路调控肺泡巨噬细胞自噬[J]. 中国全科医学, 2023, 26(03): 293-303.

机器学习算法	AUC		准确率		精确率		召回率		F1值
机器学习算法	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI
AdaBoost	0.716±0.068	（0.713，0.718）	0.678±0.061	（0.676，0.680）	0.659±0.073	（0.656，0.662）	0.765±0.070	（0.763，0.768）	0.706±0.060	（0.704，0.708）
Bagging	0.657±0.074	（0.654，0.660）	0.627±0.059	（0.624，0.629）	0.623±0.076	（0.620，0.626）	0.670±0.086	（0.667，0.673）	0.643±0.067	（0.640，0.645）
Bernoulli Naive Bayes	0.697±0.062	（0.694，0.699）	0.648±0.057	（0.646，0.650）	0.639±0.073	（0.636，0.642）	0.721±0.069	（0.718，0.724）	0.675±0.058	（0.673，0.677）
Decision Tree	0.681±0.065	（0.678，0.684）	0.683±0.061	（0.681，0.686）	0.658±0.074	（0.655，0.661）	0.799±0.069	（0.797，0.802）	0.719±0.058	（0.717，0.721）
Ensemble Learning	0.757±0.057	（0.755，0.760）	0.708±0.056	（0.706，0.711）	0.695±0.074	（0.692，0.698）	0.771±0.074	（0.768，0.774）	0.728±0.057	（0.725，0.730）
Extra Tree	0.666±0.065	（0.664，0.669）	0.658±0.062	（0.655，0.660）	0.646±0.077	（0.643，0.649）	0.733±0.089	（0.729，0.737）	0.683±0.064	（0.680，0.685）
Gaussian Naive Bayes	0.654±0.066	（0.651，0.656）	0.610±0.057	（0.608，0.612）	0.597±0.070	（0.595，0.600）	0.728±0.074	（0.725，0.731）	0.654±0.060	（0.651，0.656）
Gradient Boosting	0.707±0.064	（0.705，0.710）	0.655±0.065	（0.653，0.658）	0.645±0.079	（0.642，0.648）	0.726±0.074	（0.723，0.729）	0.680±0.065	（0.678，0.683）
KNN	0.663±0.071	（0.660，0.666）	0.633±0.066	（0.630，0.636）	0.609±0.080	（0.606，0.612）	0.809±0.087	（0.806，0.813）	0.690±0.060	（0.688，0.693）
LDA	0.714±0.060	（0.712，0.716）	0.678±0.053	（0.676，0.680）	0.665±0.070	（0.662，0.667）	0.743±0.070	（0.740，0.746）	0.699±0.056	（0.697，0.701）
Logistic Regression	0.721±0.062	（0.718，0.723）	0.689±0.056	（0.687，0.692）	0.678±0.072	（0.675，0.681）	0.748±0.069	（0.746，0.751）	0.709±0.058	（0.707，0.711）
Multinomial Naive Bayes	0.651±0.064	（0.648，0.654）	0.602±0.068	（0.600，0.605）	0.602±0.081	（0.598，0.605）	0.668±0.122	（0.663，0.673）	0.627±0.080	（0.624，0.630）
Passive Aggressive	0.686±0.075	（0.683，0.689）	0.624±0.082	（0.621，0.628）	0.636±0.095	（0.632，0.639）	0.626±0.200	（0.618，0.634）	0.613±0.126	（0.608，0.619）
QDA	0.686±0.067	（0.683，0.688）	0.646±0.061	（0.643，0.648）	0.630±0.074	（0.627，0.633）	0.753±0.075	（0.750，0.756）	0.683±0.061	（0.681，0.686）
Random Forest	0.687±0.066	（0.685，0.690）	0.659±0.063	（0.657，0.662）	0.657±0.076	（0.654，0.660）	0.692±0.088	（0.689，0.696）	0.671±0.069	（0.668，0.674）
SGD	0.718±0.064	（0.715，0.720）	0.672±0.054	（0.670，0.674）	0.657±0.071	（0.655，0.660）	0.747±0.075	（0.744，0.750）	0.697±0.058	（0.694，0.699）
SVM	0.708±0.061	（0.705，0.710）	0.648±0.072	（0.645，0.650）	0.641±0.083	（0.637，0.644）	0.709±0.082	（0.706，0.712）	0.671±0.072	（0.668，0.674）
XGBoost	0.680±0.069	（0.677，0.683）	0.639±0.066	（0.637，0.642）	0.636±0.082	（0.632，0.639）	0.697±0.081	（0.694，0.700）	0.662±0.067	（0.659，0.664）
P值	<0.000 1		<0.000 1		<0.000 1		<0.000 1		<0.000 1

机器学习算法	AUC		准确率		精确率		召回率		F1值
机器学习算法	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI	（±s）	95%CI
AdaBoost	0.716±0.068	（0.713，0.718）	0.678±0.061	（0.676，0.680）	0.659±0.073	（0.656，0.662）	0.765±0.070	（0.763，0.768）	0.706±0.060	（0.704，0.708）
Bagging	0.657±0.074	（0.654，0.660）	0.627±0.059	（0.624，0.629）	0.623±0.076	（0.620，0.626）	0.670±0.086	（0.667，0.673）	0.643±0.067	（0.640，0.645）
Bernoulli Naive Bayes	0.697±0.062	（0.694，0.699）	0.648±0.057	（0.646，0.650）	0.639±0.073	（0.636，0.642）	0.721±0.069	（0.718，0.724）	0.675±0.058	（0.673，0.677）
Decision Tree	0.681±0.065	（0.678，0.684）	0.683±0.061	（0.681，0.686）	0.658±0.074	（0.655，0.661）	0.799±0.069	（0.797，0.802）	0.719±0.058	（0.717，0.721）
Ensemble Learning	0.757±0.057	（0.755，0.760）	0.708±0.056	（0.706，0.711）	0.695±0.074	（0.692，0.698）	0.771±0.074	（0.768，0.774）	0.728±0.057	（0.725，0.730）
Extra Tree	0.666±0.065	（0.664，0.669）	0.658±0.062	（0.655，0.660）	0.646±0.077	（0.643，0.649）	0.733±0.089	（0.729，0.737）	0.683±0.064	（0.680，0.685）
Gaussian Naive Bayes	0.654±0.066	（0.651，0.656）	0.610±0.057	（0.608，0.612）	0.597±0.070	（0.595，0.600）	0.728±0.074	（0.725，0.731）	0.654±0.060	（0.651，0.656）
Gradient Boosting	0.707±0.064	（0.705，0.710）	0.655±0.065	（0.653，0.658）	0.645±0.079	（0.642，0.648）	0.726±0.074	（0.723，0.729）	0.680±0.065	（0.678，0.683）
KNN	0.663±0.071	（0.660，0.666）	0.633±0.066	（0.630，0.636）	0.609±0.080	（0.606，0.612）	0.809±0.087	（0.806，0.813）	0.690±0.060	（0.688，0.693）
LDA	0.714±0.060	（0.712，0.716）	0.678±0.053	（0.676，0.680）	0.665±0.070	（0.662，0.667）	0.743±0.070	（0.740，0.746）	0.699±0.056	（0.697，0.701）
Logistic Regression	0.721±0.062	（0.718，0.723）	0.689±0.056	（0.687，0.692）	0.678±0.072	（0.675，0.681）	0.748±0.069	（0.746，0.751）	0.709±0.058	（0.707，0.711）
Multinomial Naive Bayes	0.651±0.064	（0.648，0.654）	0.602±0.068	（0.600，0.605）	0.602±0.081	（0.598，0.605）	0.668±0.122	（0.663，0.673）	0.627±0.080	（0.624，0.630）
Passive Aggressive	0.686±0.075	（0.683，0.689）	0.624±0.082	（0.621，0.628）	0.636±0.095	（0.632，0.639）	0.626±0.200	（0.618，0.634）	0.613±0.126	（0.608，0.619）
QDA	0.686±0.067	（0.683，0.688）	0.646±0.061	（0.643，0.648）	0.630±0.074	（0.627，0.633）	0.753±0.075	（0.750，0.756）	0.683±0.061	（0.681，0.686）
Random Forest	0.687±0.066	（0.685，0.690）	0.659±0.063	（0.657，0.662）	0.657±0.076	（0.654，0.660）	0.692±0.088	（0.689，0.696）	0.671±0.069	（0.668，0.674）
SGD	0.718±0.064	（0.715，0.720）	0.672±0.054	（0.670，0.674）	0.657±0.071	（0.655，0.660）	0.747±0.075	（0.744，0.750）	0.697±0.058	（0.694，0.699）
SVM	0.708±0.061	（0.705，0.710）	0.648±0.072	（0.645，0.650）	0.641±0.083	（0.637，0.644）	0.709±0.082	（0.706，0.712）	0.671±0.072	（0.668，0.674）
XGBoost	0.680±0.069	（0.677，0.683）	0.639±0.066	（0.637，0.642）	0.636±0.082	（0.632，0.639）	0.697±0.081	（0.694，0.700）	0.662±0.067	（0.659，0.664）
P值	<0.000 1		<0.000 1		<0.000 1		<0.000 1		<0.000 1