Chinese General Practice

    Next Articles

Construction and Validation of a Clinical Prediction Model for Epilepsy after Ischemic Stroke:a Retrospective,Case-control Study

  

  1. 1.Department of Neurology,the People's Hospital of Liaoning Province,Shenyang 110000,China;2.Department of Neurological Function,the People's Hospital of Liaoning Province,Shenyang 110000,China
  • Received:2025-09-03 Accepted:2025-10-09
  • Contact: LIN Muhui,Chief Physician,Email:lcllmh@126.com

缺血性卒中后癫痫的临床预测模型构建及验证:一项回顾性病例对照研究

  

  1. 1.110000 辽宁省沈阳市,辽宁省人民医院神经内科;2.110000 辽宁省沈阳市,辽宁省人民医院神经功能科
  • 通讯作者: 蔺慕会,主任医师;E-mail:lcllmh@126.com
  • 基金资助:
    辽宁省科学技术计划项目(2023-BSBA-183)

Abstract: Background Post-stroke epilepsy is the main cause of acquired epilepsy in the elderly population and can have an negative influence on the prognosis of stroke. The incidence of ischemic stroke is higher in clinical practice. Effectively identifying the high-risk groups of epilepsy after ischemic stroke is of great significance for the preventive treatment of epilepsy after stroke. Objective To explore the hazard factors of epilepsy after ischemic stroke and establish a machine learning prediction model for patients with ischemic stroke within 5 years. Methods We retrospectively collected the clinical data of 1 555 patients with acute ischemic stroke who were hospitalized in the neurology department of People's Hospital of Liaoning Province from January 2015 to May 2020. Follow up on whether post-stroke epilepsy occurred within 5 years after the occurrence of stroke. The research subjects were randomly divided into the training set(n=1 088)and the validation set(n=467)in a ratio of 7∶3. The least absolute shrinkage and selection operator(LASSO) regression analysis was used to screen potential predictors. Subsequently,six machine learning models were constructed,namely the logistic regression(LR),decision tree(DT),K-nearest neighbors(KNN),extreme gradient boosting(XGBoost),light gradient boosting machine(LightGBM),and random forest(RF). The receiver operator characteristic(ROC)curve,calibration curve(CC) and clinical decision analysis(DCA) curve were used to evaluate the discrimination ability,calibration and clinical validity of the models. The shapley additive explanations (SHAP)analysis method was adopted to explain the model results and evaluate the importance of each feature of the model. Results A total of 1,555 patients were ultimately included in this study,among which 1,361 were non-epileptic patients and 194 were epileptic patients. In the training set,there were 961 non-epileptic patients and 127 epileptic patients;in the validation set,there were 400 non-epileptic patients and 67 epileptic patients. There was no significant difference(P>0.05)between the training set and the validation set in terms of the proportion of post-stroke epilepsy,gender,age,history of hypertension, diabetes,coronary heart disease,atrial fibrillation,smoking,drinking,fasting blood glucose,homocysteine,blood uric acid,triglycerides,total cholesterol,low-density lipoprotein,electrolyte imbalance,NIHSS score,whether a single lobe was involved,TOAST classification,whether the cortex was involved,whether it was a middle cerebral artery territory infarction, whether it was an anterior circulation infarction,whether there was hemorrhagic transformation,and whether there was early-onset epilepsy. LASSO regression screened out 7 predictors,namely early-onset epilepsy,cortical involvement,hemorrhagic transformation,electrolyte imbalance,NIHSS,anterior circulation infarction and TOAST classification,as non-zero coefficient factors and included them in the 6 machine learning models. In the training and test sets,the area under the ROC curve(AUC)for predicting post-ischemic stroke epilepsy by the XGBoost model was 0.953 and 0.947,respectively. Meanwhile,the accuracy,specificity,sensitivity and F1 score of the XGBoost model were all higher than those of the LR and LightGBM models. The calibration curve of the XGBoost model was the closest to the ideal curve both in the training set and the validation set. The DCA curve analysis of each model in the training set and validation set showed that when the threshold probability was between 0.1 and 0.8,XGBoost and RF had a large net benefit rate. Taking all the indicators into consideration,the XGBoost model performed the best among the 6 models. Based on the XGBoost model,the SHAP values of each feature in the model were calculated. It was found that cortical involvement was the most significant feature affecting the model. The importance of the other features,in order,was large artery atherosclerotic subtype in the TOAST classification,cardiogenic embolism subtype,NIHSS ≥ 15 points,electrolyte imbalance,anterior circulation infarction,NIHSS 5-14 points,early-onset epilepsy,small artery occlusion subtype,and hemorrhagic transformation. Among them,except for the small artery occlusion subtype, which was negatively correlated with epilepsy after ischemic stroke,the other features were all positively correlated with epilepsy after ischemic stroke. Conclusion Among the machine learning models for post-ischemic stroke epilepsy established in this study,the XGBoost model is the best. The model contains seven clinically accessible factors,namely early-onset epilepsy,cortical involvement,hemorrhagic transformation,electrolyte imbalance,NIHSS,anterior circulation infarction and TOAST classification,and has good clinical applicability.

Key words: Post-ischemic stroke epilepsy, Risk factors, Predictive model, Machine learning, XGBoost, Case control studies

摘要: 背景 卒中后癫痫是老年人群获得性癫痫的主要病因,可对卒中的预后产生不良影响。临床上缺血性卒中发病率更高,有效识别缺血性卒中后癫痫的高危人群对于卒中后癫痫的预防性治疗具有重要意义。目的 分析缺血性卒中后癫痫发生的危险因素,构建并验证缺血性卒中患者5年内发生卒中后癫痫的机器学习预测模型。方法 回顾性选取2015年1月—2020年5月于辽宁省人民医院神经内科住院的急性缺血性卒中患者为研究对象,收集共1 555例患者的临床病例资料,随访卒中发生后5年内是否发生了卒中后癫痫。按7∶3的比例将研究对象随机拆分为训练集(n=1 088)和验证集(n=467)。采用最小绝对收缩和选择算子(LASSO)回归分析用于筛选潜在的预测因子。随后构建6种机器学习预测模型,分别是逻辑回归模型(LR)、决策树模型(DT)、K-最近邻模型(KNN),极度梯度提升模型(XGBoost)、轻量的梯度提升机模型(LightGBM)和随机森林模型(RF)。使用受试者工作特征曲线(ROC)、校准曲线(CC)和临床决策分析曲线(DCA)评价模型区分度、一致性及临床有效性。采用夏普利加性解释(SHAP)分析方法阐释模型结果并评估模型各特征重要性。结果 本研究最终纳入患者1 555例,其中非癫痫患者1 361例,癫痫患者194例。训练集中非癫痫患者961例,癫痫患者127例;验证集中非癫痫患者400例,癫痫患者67例。训练集和验证集发生卒中后癫痫比例、性别、年龄、高血压史、糖尿病史、冠心病史、心房颤动史、吸烟史、饮酒史、空腹血糖、同型半胱氨酸、血尿酸、三酰甘油、总胆固醇、低密度脂蛋白、是否电解质紊乱、NIHSS评分、是否累及单个脑叶、TOAST分型、是否皮质受累、是否大脑中动脉供血区梗死、是否前循环梗死、是否存在出血转化及是否早发癫痫比较,差异无统计学意义(P>0.05)。LASSO回归筛选出7个预测因子即早发癫痫、皮质受累、出血转化、电解质紊乱、NIHSS评分、前循环梗死及TOAST分型作为非零系数因子构建6种机器学习模型。在训练集和验证集中,XGBoost模型预测缺血性卒中后癫痫的ROC曲线下面积(AUC)分别为0.953和0.947,同时XGBoost模型的准确度、特异度、灵敏度和F1评分均高于LR和LightGBM模型。在训练集和验证集中,XGBoost模型的校准曲线与理想曲线最为接近。训练集和验证集各模型DCA曲线分析显示当阈值概率在0.1~0.8时,XGBoost和RF存在较大的净收益率。综合考量各项指标,XGBoost模型在各模型中表现最优。基于XGBoost模型计算模型中各特征的SHAP值。发现皮质受累对模型影响最为显著,其余特征重要性依次为TOAST分型中的大动脉粥样硬化型、心源性栓塞型、NIHSS评分≥15分、电解质紊乱、前循环梗死、NIHSS评分5~14分、早发性癫痫、小动脉闭塞型、出血转化。其中除小动脉闭塞型与缺血性卒中后癫痫呈负相关外,其余特征均与缺血性卒中后癫痫正相关。结论 本研究所建立的急性缺血性卒中后癫痫机器学习模型中,XGBoost模型最优。模型包含7个临床易获得因素,分别为早发癫痫、皮质受累、出血转化、电解质紊乱、NIHSS评分、前循环梗死及TOAST分型,具有良好的临床适用性。

关键词: 卒中后癫痫, 危险因素, 预测模型, 机器学习, 极度梯度提升模型, 病例对照研究

CLC Number: