Chinese General Practice

Previous Articles     Next Articles

Predictive Value of Convolutional Neural Network for Chronic Kidney Disease Progression Based on Chronic Kidney Disease Dataset

  

  1. 1.Department of Nephrology,Tianjin First Central Hospital,Tianjin 300192,China 2.School of Computer Science,Nankai University,Tianjin 300000,China
  • Received:2024-10-12 Revised:2024-12-08 Accepted:2024-12-25
  • Contact: CHANG Wenxiu,Chief physician;E-mail:changwx@sina.com

基于慢性肾脏病数据集的卷积神经网络对慢性肾脏病进展的预测价值研究

  

  1. 1.300192 天津市第一中心医院肾内科 2.300000 天津市,南开大学计算机学院
  • 通讯作者: 常文秀,主任医师,E-mail:changwx@sina.com
  • 基金资助:
    天津市卫生健康科技面上项目(TJWJ2021MS012)

Abstract: Background Early and accurate prediction of the risk of developing end-stage renal disease(ESRD) is essential for medical decision-making. In the field of chronic kidney disease(CKD),many scholars are exploring the impact of various factors and the percentage decline in estimated glomerular filtration rate(eGFR)in the previous 2 years on the development of ESRD from a medical perspective. Traditional risk assessment methods usually rely on expert experience,simple statistical analyses,and limited biomarkers,which face obvious limitations when dealing with complex,multidimensional health data,whereas the use of machine learning algorithms,such as artificial neural networks,can significantly improve the accuracy,sensitivity,and specificity of risk prediction.Objective Based on multiple algorithms,we explored the predictive value of 2-year mean levels of clinical parameters and the rate of change of eGFR over a period of 2 years in the progression of CKD to ESRD. Methods The dataset for this study was obtained from a retrospective cohort of the Japanese CKD population at Teikyo University Hospital,Japan,from 2008 to 2014,700 patients were enrolled in the study cohort. Two datasets were obtained based on this cohort,a baseline dataset and a 2-year time-averaged dataset. Logistic regression(LR),multilayer perceptual machine(MLP),support vector machine(SVM),extreme gradient boosting tree(XGBoost),and two-dimensional convolutional neural network(CNN)algorithms were used to predict whether a patient would reach ESRD after several years and to derive probabilities. The dataset is balanced at both the data and algorithmic levels,and medical significance is demonstrated using comparative trials.Results Using LR,MLP,SVM,and XGBoost as the baseline models,the comparison experiments show that the CNN model performs the best,with an accuracy of 94.8%,precision of 80.3%,recall of 78.2%,and F1 score of 78.4%. The evaluation metrics of the five models on the two-year time-averaged dataset were significantly higher than those on the baseline dataset,especially the recall rate. In addition,models that included the eGFR decline rate variable over two years outperformed models that did not include this variable. Recall improved considerably after addressing the imbalance in the dataset categories. Conclusion This study demonstrates that a two-dimensional CNN model based on the CKD dataset can guide healthcare professionals to make better clinical treatment decisions,that the mean level of clinical parameters in the first 2 years and the percentage decline in eGFR over 2 years are significant in predicting dialysis events,and that comprehensive management in the first 2 years is essential to delay the onset of ESRD.

Key words: Chronic kidney disease, End-stage renal disease, Prediction, Convolutional neural networks, Computer-aided diagnosis, Deep learning

摘要: 背景 早期准确预测罹患终末期肾病(ESRD)的风险对医疗决策至关重要。在慢性肾脏病(CKD)领域,许多学者正从医学角度探讨各种因素和前 2 年估算肾小球滤过率(eGFR)下降百分比对 ESRD 发展的影响。传统的风险评估方法通常依赖于专家经验、简单的统计分析和有限的生物标志物,这些方法在处理复杂、多维度的健康数据时面临明显的局限,而采用机器学习算法,如人工神经网络可以显著提升风险预测的准确性、灵敏度和特异度。目的 基于多种算法探究 2 年临床参数平均水平和 2 年内 eGFR 变化率对 CKD 发展至 ESRD 的预测价值。方法 本研究数据集来自 2008—2014 年日本帝京大学医院的日本 CKD 群体回顾性队列,700 例患者入选研究队列。基于该队列获取两个数据集,分别是基线数据集和 2 年时间平均数据集。使用逻辑回归(LR)、多层感知机(MLP)、支持向量机(SVM)、极端梯度提升树(XGBoost)、卷积神经网络(CNN)算法预测患者是否会在数年后达到 ESRD,并得出概率。从数据和算法两个层面平衡数据集,使用对比试验证明医学上的意义。结果 将 LR、MLP、SVM、XGBoost 作为基准模型,对比试验表明,CNN 模型表现最佳,准确率为 94.8%,精确率为 80.3%,召回率为 78.2%,F1 分数为 78.4%。5 个模型在 2 年时间平均数据集上的评价指标明显高于基线数据集上的指标,尤其是召回率。此外,包含 2 年内 eGFR 下降率变量的模型优于不包含该变量的模型。在解决数据集类别不平衡的问题后,召回率有了很大程度的提高。结论 本研究证明基于 CKD 数据集的 CNN 模型可以指导医护人员做出更佳的临床治疗决策,前 2 年临床参数的平均水平和2年内 eGFR 下降百分比对预测透析事件具有重大意义,前 2 年的综合管理对于推迟发生 ESRD 至关重要。

关键词: 慢性肾脏病, 终末期肾病, 预测, 卷积神经网络, 计算机辅助诊断, 深度学习

CLC Number: