Chinese General Practice

Previous Articles     Next Articles

Risk Prediction of Early Recurrence of Advanced Colorectal Neoplasm After Colorectal Adenoma Resection

  

  1. 1.School of Nursing and Health,Zhengzhou University,Zhengzhou 450001,China;2.Health management Center,The First Affiliated Hospital of Zhengzhou University,Zhengzhou 450052,China;3.Medical Affairs Department,Gaoping People's Hospital,Jincheng 048400,China
  • Contact: DING Suying,Chief superintendent nurse;Email:fccdingsy@zzu.edu.cn

结直肠腺瘤切除术后进展期肿瘤早期再发的风险预测研究

  

  1. 1.450001 河南省郑州市,郑州大学护理与健康学院;2.450052 河南省郑州市,郑州大学第一附属医院健康管理中心;3.048400 山西省晋城市,高平市人民医院医务科
  • 通讯作者: 丁素英,主任护师;Email:fccdingsy@zzu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(72101236);中国博士后科学基金项目(2022M722900);河南省科技攻关项目(242102311099);河南省高等学校重点科研计划项目(25A320073);郑州市协同创新项目(XTCX2023006);郑州大学第一附属医院护理团队项目(HLKY2023005)

Abstract: Background Colorectal adenoma resection is an effective method to reduce colorectal cancer incidence. However,the recurrence rate of Advanced Colorectal Neoplasm(ACRN) within one year after resection is high,and research on predictive models for early ACRN recurrence is lacking. Objective To use machine learning to identify risk factors and develop a prediction model for early ACRN recurrence after colorectal adenoma resection. Methods A total of 222 patients who underwent three or more colonoscopies and had colorectal adenomas with surgical resection at the First Affiliated Hospital of Zhengzhou University from January 2017 to August 2023 were retrospectively included as the research subjects. Patients were divided into an early recurrence group(n=68)and a non-early recurrence group(n=154)based on ACRN occurrence within one year post-surgery. Clinical characteristics were compared. Subjects were split 8:2 into training and test sets. Boruta and Lasso regression methods jointly selected predictive features. Four machine learning models-Categorical Boosting(Catboost),Random Forest(RF),Logistic Regression(LR),and Support Vector Machine(SVM)-were built. Model performance was evaluated using sensitivity,specificity,AUC,calibration curves,and Decision Curve Analysis(DCA). Feature importance and SHAP analysis identified key risk factors. Results Significant differences(P<0.05)were found in adenoma number,size,location,dysplasia,bloating,number of clinical symptoms,drinking history,platelet count,and Neutrophil-to-Lymphocyte Ratio(NLR)between groups. Based on the combined Boruta and Lasso methods,seven predictors were selected:adenoma size,platelet count,degree of adenoma dysplasia,number of clinical symptoms,TyG,drinking history,and adenoma number. Using the above seven predictors,four prediction models including Catboost,RF,LR and SVM for early ACRN recurrence after colorectal adenoma resection were developed.The results of ROC curve analysis showed that in the training set,the AUCs of the four models Catboost,RF,LR,and SVM were 0.802,0.836,0.788,and 0.860,respectively; In the testing set,the AUCs of the four models were 0.772,0.749,0.705,and 0.685,respectively. The results of Delong test showed that there was no statistically significant difference in the pairwise comparison of AUCs among the four models(all P values were >0.05). The results of calibration curve analysis showed that in the training set,the Brier scores of the four models of Catboost,RF,LR,and SVM were 0.178,0.197,0.169,and 0.153,respectively; In the testing set,the Brier scores of the four models were 0.188,0.201,0.191,and 0.198,respectively. The results of DCA curve analysis showed that in the training set,relatively high clinical net benefits were obtained based on the Catboost,LR,and SVM models; In the testing set,the Catboost and SVM models could achieve good clinical net gains. Based on the SHAP interpretability analysis of the Catboost model,the number of clinical symptoms,adenoma size,and adenoma number were identified as the top three most important features for predicting early postoperative ACRN recurrence. Among these,the number of clinical symptoms,adenoma size,adenoma number,degree of adenoma dysplasia,TyG,and platelet count(with SHAP values of 0.043,0.042,0.025,0.020,0.012,and 0.005,respectively) were all positively associated with early postoperative ACRN recurrence. In contrast,a history of alcohol consumption(SHAP value:0.015) was negatively associated with early postoperative ACRN recurrence. Conclusion The risk prediction model developed using Catboost demonstrates excellent predictive performance and clinical applicability,making it suitable for predicting early postoperative ACRN recurrence following colorectal adenoma resection.

Key words: Advanced colorectal neoplasia, Early recurrence, Influence factor, Prediction model, Interpretability analysis

摘要: 背景 结直肠腺瘤切除术是降低结直肠癌发病率的有效方式,目前结直肠腺瘤切除术后1年内进展期结直肠肿瘤(ACRN)的再发率高,关于结直肠腺瘤切除后早期再发ACRN的预测模型构建缺乏相关研究。目的 采用机器学习的方法探讨结直肠腺瘤切除术后患者早期再发ACRN的影响因素,并构建结直肠腺瘤切除术后患者早期再发ACRN的预测模型。方法 回顾性纳入2017年1月—2023年8月郑州大学第一附属医院行3次以上结肠镜检查的结直肠腺瘤并行手术切除的222例患者为研究对象,根据术后1年内是否发生ACRN分为早期再发组(n=68,)和非早期再发组(n=154),收集患者一般资料和实验室检查指标。将研究对象按照8:2划分为训练集和测试集,通过Boruta和Lasso回归方法共同筛选预测因素,分别使用类别提升(Catboost)、随机森林(RF)、逻辑回归(LR)、支持向量机(SVM)4种机器学习方法构建预测模型,绘制受试者工作特征(ROC)曲线、校准曲线、临床决策分析(DCA)曲线,评估预测模型的性能。采用特征重要性和SHAP可解释性分析讨论结直肠腺瘤切除术后患者早期再发ACRN的相关危险因素。结果 早期再发组和非早期再发组腺瘤数量、腺瘤大小、腺瘤部位、腺瘤异型增生程度、腹胀、临床症状个数、饮酒史、血小板计数、中性粒细胞与淋巴细胞计数比值(NLR)比较,差异有统计学意义(P<0.05)。基于Boruta和Lasso方法共同筛选出腺瘤大小、血小板计数、腺瘤异型增生程度、临床症状个数、三酰甘油-葡萄糖(TyG)、饮酒史、腺瘤数量共7个预测因素,根据上述7个预测因素构建Catboost、RF、LR、SVM 4种结直肠腺瘤切除术后早期再发ACRN的预测模型。ROC曲线分析结果显示,在训练集中,Catboost、RF、LR、SVM 4个模型的AUC分别为0.802、0.836、0.788、0.860;在测试集中,4个模型的AUC分别为0.772、0.749、0.705、0.685;Delong检验结果显示,4个模型的AUC两两比较,差异均无统计学意义(P均>0.05)。校准曲线分析结果显示,训练集中Catboost、RF、LR、SVM 4个模型的Brier分数分别为0.178、0.197、0.169、0.153,测试集中4个模型的Brier分数分别为0.188、0.201、0.191、0.198。DCA曲线分析显示,在训练集中基于Catboost、LR及SVM模型获得较高的临床净效益,在测试集中,Catboost和SVM模型可获得较好的临床净收益。基于Catboost模型的SHAP可解释性分析显示临床症状个数、腺瘤大小、腺瘤数量依次是预测术后ACRN早期再发的前3位重要特征,其中临床症状个数、腺瘤大小、腺瘤数量、腺瘤异型增生程度、TyG、血小板计数(SHAP值分别为:0.043、0.042、0.025、0.020、0.012、0.005)均与术后早期ACRN再发呈正相关,饮酒史(SHAP值为0.015)与术后早期ACRN再发呈负相关。结论 基于Catboost方法构建的风险预测模型具有良好的预测效果和临床实用性,可以用来预测结直肠腺瘤切除术后早期ACRN的再发。

关键词: 进展期结直肠肿瘤, 早期再发, 影响因素, 预测模型, 可解释性分析

CLC Number: