Predictive Value of Convolutional Neural Network for Chronic Kidney Disease Progression Based on Chronic Kidney Disease Dataset

doi:10.12114/j.issn.1007-9572.2024.0604

Abstract

Abstract:

Background

Early and accurate prediction of the risk of developing end-stage renal disease (ESRD) is essential for medical decision-making. In the field of chronic kidney disease (CKD), previous studies have reported the impact of various factors and the percentage decline in estimated glomerular filtration rate (eGFR) in the previous 2 years on the development of ESRD from a medical perspective. Traditional risk assessment methods usually rely on expert experience, simple statistical analyses, and limited biomarkers, which face obvious limitations when dealing with complex, multidimensional health data, whereas the use of machine learning algorithms, such as artificial neural networks, can significantly improve the accuracy, sensitivity, and specificity of risk prediction.

Objective

Based on multiple algorithms, this study explored the predictive value of 2-year mean levels of clinical parameters and the rate of change of eGFR over a period of 2 years in the progression of CKD to ESRD.

Methods

The dataset for this study was obtained from a retrospective cohort of the Japanese CKD population at Teikyo University Hospital, Japan, from 2008 to 2014, 700 patients were enrolled in the study cohort. Two datasets were obtained based on this cohort, a baseline dataset and a 2-year time-averaged dataset. Logistic regression (LR), multilayer perceptual machine (MLP), support vector machine (SVM), extreme gradient boosting tree (XGBoost), and two-dimensional convolutional neural network (CNN) algorithms were used to predict whether a patient would reach ESRD after several years and to derive probabilities. The dataset is balanced at both the data and algorithmic levels, and medical significance is demonstrated using comparative trials.

Results

Using LR, MLP, SVM, and XGBoost as the baseline models, the comparison experiments showed that the CNN model was the best, with an accuracy of 94.8%, precision of 80.3%, recall of 78.2%, and F1 score of 78.4%. The evaluation metrics of the five models on the two-year time-averaged dataset were significantly higher than those on the baseline dataset, especially the recall rate. In addition, models that included the eGFR decline rate variable over two years outperformed models that did not include this variable. Recall improved considerably after addressing the imbalance in the dataset categories.

Conclusion

This study demonstrates that a two-dimensional CNN model based on the CKD dataset can guide healthcare professionals to make better clinical treatment decisions, that the mean level of clinical parameters in the first 2 years and the percentage decline in eGFR over 2 years are significant in predicting dialysis events, and that comprehensive management in the first 2 years is essential to delay the onset of ESRD.

Key words: Chronic kidney disease, End-stage renal disease, Prediction, Convolutional neural networks, Computer-aided diagnosis, Deep learning

摘要：

背景

早期准确预测罹患终末期肾病（ESRD）的风险对医疗决策至关重要。在慢性肾脏病（CKD）领域，已有研究报道多种因素和前2年估算肾小球滤过率（eGFR）下降百分比对ESRD发展的影响。传统的风险评估方法通常依赖于专家经验、简单的统计分析和有限的生物标志物，这些方法在处理复杂、多维度的健康数据时具有明显的局限性，而采用机器学习算法，如人工神经网络可以显著提升风险预测的准确性、灵敏度和特异度。

目的

基于多种算法探究2年临床参数平均水平和2年内eGFR变化率对CKD发展至ESRD的预测价值。

方法

本研究数据集来自2008—2014年日本帝京大学医院的日本CKD群体回顾性队列，700例患者入选研究队列。基于该队列获取两个数据集，分别是基线数据集和2年时间平均数据集。使用逻辑回归（LR）、多层感知机（MLP）、支持向量机（SVM）、极端梯度提升树（XGBoost）、卷积神经网络（CNN）算法预测患者是否会在数年后达到ESRD，并得出概率。从数据和算法两个层面平衡数据集，使用对比试验证明医学上的意义。

结果

将LR、MLP、SVM、XGBoost作为基准模型，对比试验表明，CNN模型表现最佳，准确率为94.8%，精确率为80.3%，召回率为78.2%，F1分数为78.4%。5个模型在2年时间平均数据集上的评价指标明显高于基线数据集上的指标，尤其是召回率。此外，包含2年内eGFR下降率变量的模型优于不包含该变量的模型。在解决数据集类别不平衡的问题后，召回率有了很大程度的提高。

结论

研究证明基于CKD数据集的CNN模型可以指导医护人员做出更佳的临床治疗决策，前2年临床参数的平均水平和2年内eGFR下降百分比对预测透析事件具有重大意义，前2年的综合管理对于推迟发生ESRD至关重要。

关键词: 慢性肾脏病, 终末期肾病, 预测, 卷积神经网络, 计算机辅助诊断, 深度学习

CLC Number:

R 692.5

SONG Xinyuan,CHANG Wenxiu,ZHANG Wenyu, et al. Predictive Value of Convolutional Neural Network for Chronic Kidney Disease Progression Based on Chronic Kidney Disease Dataset[J]. Chinese General Practice, 2025, 28(35): 4457-4463. DOI: 10.12114/j.issn.1007-9572.2024.0604.
宋欣芫,常文秀,张文玉等. 基于慢性肾脏病数据集的卷积神经网络对慢性肾脏病进展的预测价值研究[J]. 中国全科医学, 2025, 28(35): 4457-4463. DOI: 10.12114/j.issn.1007-9572.2024.0604.

Figures/Tables 8

References 29

[1]	LEVEY A S, CORESH J. Chronic kidney disease[J]. Lancet，2012，379（9811）：165-180. DOI：10.1016/S0140-6736(11)60178-5.
[2]	BRÜCK K, STEL V S, GAMBARO G, et al. CKD prevalence varies across the European general population[J]. J Am Soc Nephrol，2016，27（7）：2135-2147. DOI：10.1681/ASN.2015050542.
[3]	STEVENS P E, LEVIN A, Kidney Disease：Improving Global Outcomes Chronic Kidney Disease Guideline Development Work Group Members. Evaluation and management of chronic kidney disease：synopsis of the kidney disease：improving global outcomes 2012 clinical practice guideline[J]. Ann Intern Med，2013，158（11）：825-830. DOI：10.7326/0003-4819-158-11-201306040-00007.
[4]	王李胜，童辉，杨建国，等. 人工智能在慢性肾脏病应用现状及展望[J]. 中国血液净化，2022，21（1）：59-62. DOI：10.3969/j.issn.1671-4091.2022.01.014.
[5]	YUAN Q J, ZHANG H X, DENG T C, et al. Role of artificial intelligence in kidney disease[J]. Int J Med Sci，2020，17（7）：970-984. DOI：10.7150/ijms.42078.
[6]	CHANG W X, ASAKAWA S, TOYOKI D, et al. Predictors and the subsequent risk of end-stage renal disease-usefulness of 30% decline in estimated GFR over 2 years[J]. PLoS One，2015，10（7）：e0132927. DOI：10.1371/journal.pone.0132927.
[7]	MATSUSHITA K, CHEN J S, SANG Y Y, et al. Risk of end-stage renal disease in Japanese patients with chronic kidney disease increases proportionately to decline in estimated glomerular filtration rate[J]. Kidney Int，2016，90（5）：1109-1114.
[8]	NEUEN B L, WELDEGIORGIS M, HERRINGTON W G, et al. Changes in GFR and albuminuria in routine clinical practice and the risk of kidney disease progression[J]. Am J Kidney Dis，2021，78（3）：350-360.e1. DOI：10.1053/j.ajkd.2021.02.335.
[9]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C/OL]//PEREIRA F，BURGES C J C，BOTTOU L，et al. Advances in Neural Information Processing Systems：volume 25. Curran Associates，Inc.，2012. （2012-01-01）[2024-10-08].
[10]	王旭东，曹正远. 新街能源智能一体化管控平台技术研究[J]. 中国煤炭，2024，50（10）：91-97. DOI：10.19880/j.cnki.ccm.2024.10.012.
[11]	LECUN Y, KAVUKCUOGLU K, FARABET C. Convolutional networks and applications in vision[C]//Proceedings of 2010 IEEE International Symposium on Circuits and Systems. Paris：IEEE，2010：253-256. DOI：10.1109/ISCAS.2010.5537907.
[12]	关矛，林立言. 基于多元用户异常行为数据的用户分类模型研究与应用[J]. 电信工程技术与标准化，2024，37（11）：1-6. DOI：10.13992/j.cnki.tetas.2024.11.001.
[13]	骆正山，张景奇，骆济豪，等. 基于IMA-AmMLP模型的CO₂驱最小混相压力预测[J]. 石油学报，2024，45（10）：1522-1528.
[14]	黄光成，周良，石建伟，等. 机器学习算法在疾病风险预测中的应用与比较[J]. 中国卫生资源，2020，23（4）：432-436.
[15]	曲文龙，李一漪，周磊. XGBoost算法在糖尿病血糖预测中的应用[J]. 吉林师范大学学报（自然科学版），2019，40（4）：118-125. DOI：10.16862/j.cnki.issn1674-3873.2019.04.020.
[16]	黄焕辉，荣玉军，陶禹诺. 基于网关插件及大数据技术的家庭智能设备识别与应用研究[J]. 电信工程技术与标准化，2024，37（11）：7-13. DOI：10.13992/j.cnki.tetas.2024.11.015.
[17]	王慧，戚倩倩，李雪，等. 皮肤肿瘤图像自动分类的研究进展[J]. 计算机工程与应用，2022，58（16）：31-48.
[18]	周兴雯，马春驰，王琳. 基于高分辨率卷积神经网络的皮肤常见肿瘤智能诊断模型构建[J]. 四川医学，2024，45（6）：638-645. DOI：10.16252/j.cnki.issn1004-0501-2024.06.013.
[19]	MORENO-TORRES J G, HERRERA F. A preliminary study on overlapping and data fracture in imbalanced domains by means of Genetic Programming-based feature extraction[C]//2010 10th International Conference on Intelligent Systems Design and Applications. Egypt：IEEE，2010：501-506.
[20]	CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE：synthetic minority over-sampling technique[J]. Jair，2002，16：321-357. DOI：10.1613/jair.953.
[21]	ZHOU Z H, LIU X Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem[J]. IEEE Trans Knowl Data Eng，2006，18（1）：63-77.
[22]	RAJPURKAR P, O'CONNELL C, SCHECHTER A, et al. CheXaid：deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV[J]. NPJ Digit Med，2020，3：115. DOI：10.1038/s41746-020-00322-2.
[23]	WANG X S, PENG Y F, LU L, et al. ChestX-Ray8：hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common Thorax diseases[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition（CVPR）. Honolulu：IEEE，2017：3462-3471. DOI：10.1109/CVPR.2017.369.
[24]	FOGEL A L, KVEDAR J C. Artificial intelligence powers digital medicine[J]. NPJ Digit Med，2018，1：5. DOI：10.1038/s41746-017-0012-2.
[25]	VENTRELLA P, DELGROSSI G, FERRARIO G, et al. Supervised machine learning for the assessment of Chronic Kidney Disease advancement[J]. Comput Methods Programs Biomed，2021，209：106329. DOI：10.1016/j.cmpb.2021.106329.
[26]	HUANG M L, CHOU Y C. Combining a gravitational search algorithm，particle swarm optimization，and fuzzy rules to improve the classification performance of a feed-forward neural network[J]. Comput Methods Programs Biomed，2019，180：105016. DOI：10.1016/j.cmpb.2019.105016.
[27]	NAVANEETH B, SUCHETHA M. A dynamic pooling based convolutional neural network approach to detect chronic kidney disease[J]. Biomed Signal Process Control，2020，62：102068. DOI：10.1016/j.bspc.2020.102068.
[28]	SATO N, UCHINO E, KOJIMA R, et al. Prediction and visualization of acute kidney injury in intensive care unit using one-dimensional convolutional neural networks based on routinely collected data[J]. Comput Methods Programs Biomed，2021，206：106129. DOI：10.1016/j.cmpb.2021.106129.
[29]	马倩倩，孙东旭，石金铭，等. 基于支持向量机与XGboost的成年人群肿瘤患病风险预测研究[J]. 中国全科医学，2020，23（12）：1486-1491. DOI：10.12114/j.issn.1007-9572.2020.00.066.

临床特征	赋值	临床特征	赋值
随访时间	实测值	性别	女=0，男=1
基线eGFR	实测值	糖尿病肾病	无=0，有=1
2年内eGFR下降百分比	实测值	高血压	无=0，有=1
年龄	实测值	慢性肾小球肾炎	无=0，有=1
体质指数	实测值	多囊肾病	无=0，有=1
收缩压	实测值	单肾	无=0，有=1
血红蛋白	实测值	其他疾病	无=0，有=1
白细胞	实测值	肾素-血管紧张素系统抑制剂	无=0，有=1
血小板	实测值	钙通道阻滞剂	无=0，有=1
白蛋白尿	实测值	利尿剂	无=0，有=1
血尿酸	实测值	其他药物	无=0，有=1
钠离子	实测值	透析	无=0，有=1
钾离子	实测值	随机尿试纸法检测的血尿	血尿0=0，血尿1+=1，血尿2+=2，血尿3+=3
氯离子	实测值
白蛋白校正钙	实测值
磷离子	实测值
C反应蛋白	实测值
低密度脂蛋白胆固醇	实测值
随机尿蛋白尿	实测值

临床特征	赋值	临床特征	赋值
随访时间	实测值	性别	女=0，男=1
基线eGFR	实测值	糖尿病肾病	无=0，有=1
2年内eGFR下降百分比	实测值	高血压	无=0，有=1
年龄	实测值	慢性肾小球肾炎	无=0，有=1
体质指数	实测值	多囊肾病	无=0，有=1
收缩压	实测值	单肾	无=0，有=1
血红蛋白	实测值	其他疾病	无=0，有=1
白细胞	实测值	肾素-血管紧张素系统抑制剂	无=0，有=1
血小板	实测值	钙通道阻滞剂	无=0，有=1
白蛋白尿	实测值	利尿剂	无=0，有=1
血尿酸	实测值	其他药物	无=0，有=1
钠离子	实测值	透析	无=0，有=1
钾离子	实测值	随机尿试纸法检测的血尿	血尿0=0，血尿1+=1，血尿2+=2，血尿3+=3
氯离子	实测值
白蛋白校正钙	实测值
磷离子	实测值
C反应蛋白	实测值
低密度脂蛋白胆固醇	实测值
随机尿蛋白尿	实测值

指标中文名	指标英文名称	含义	计算公式
准确率	Accuracy	预测正确的数量占总数量的百分比。准确率衡量的是分类器的分类准确程度，准确率值越大表示分类器分类结果与真实确诊结果越接近	（TP+TN）/（TP+TN+FP+FN）×100%
精确率	Precision	在所有被预测为正例的数目中实际为正例的概率	TP/（TP+FP）×100%
召回率	Recall	实际为正例的数目中被正确预测为正例的数目	TP/（TP+FN）×100%
F1分数	F1-score	兼顾分类模型的精确率和召回率，F1分数越高，模型性能越好	2×Precision×Recall/（Precision+Recall）×100%
真阳性	TP	实为正例，分类器预测为正例的数目
真阴性	TN	实为反例，分类器预测为反例的数目
假阳性	FP	实为反例，分类器预测为正例的数目
假阴性	FN	实为正例，分类器预测为反例的数目

指标中文名	指标英文名称	含义	计算公式
准确率	Accuracy	预测正确的数量占总数量的百分比。准确率衡量的是分类器的分类准确程度，准确率值越大表示分类器分类结果与真实确诊结果越接近	（TP+TN）/（TP+TN+FP+FN）×100%
精确率	Precision	在所有被预测为正例的数目中实际为正例的概率	TP/（TP+FP）×100%
召回率	Recall	实际为正例的数目中被正确预测为正例的数目	TP/（TP+FN）×100%
F1分数	F1-score	兼顾分类模型的精确率和召回率，F1分数越高，模型性能越好	2×Precision×Recall/（Precision+Recall）×100%
真阳性	TP	实为正例，分类器预测为正例的数目
真阴性	TN	实为反例，分类器预测为反例的数目
假阳性	FP	实为反例，分类器预测为正例的数目
假阴性	FN	实为正例，分类器预测为反例的数目

预测类别	实际类别
预测类别	阳性	阴性
阳性	TP	FP
阴性	FN	TN