Chinese General Practice

    Next Articles

Identification of Factors Associated with Persistent Atrial Fibrillation and Development of a Classification Model

  

  1. 1.School of Public Health, Xinjiang Medical University, Urumqi 830017, China;2.Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi 830017, China;3.Institute of Medical and Engineering Interdisciplinary Research, Xinjiang Medical University, Urumqi, Xinjiang 830017, China
  • Received:2025-09-09 Revised:2025-10-25 Accepted:2025-10-30
  • Contact: WANG Kai, Professor;LIU Hui, Lecturer

持续性心房颤动的影响因素识别及判别模型构建研究

  

  1. 1.830017 新疆维吾尔自治区乌鲁木齐市,新疆医科大学公共卫生学院;2.830017 新疆维吾尔自治区乌鲁木齐市,新疆医科大学医学工程技术学院;3.830017 新疆维吾尔自治区乌鲁木齐市,新疆医科大学医工交叉研究所
  • 通讯作者: 王凯,教授;刘慧,讲师
  • 基金资助:
    新疆维吾尔自治区重点研发计划项目(2022B03023-2)

Abstract: Background Atrial fibrillation (AF) severely impairs patients’ quality of life, leads to substantial morbidity and mortality, and increases healthcare costs. Objective To investigate the factors associated with persistent AF and to develop a classification model based on these factors. Methods Patients diagnosed with paroxysmal and persistent AF at the First Affiliated Hospital of Xinjiang Medical University between April 2012 and September 2023 were enrolled in this study. Clinical data, including demographic characteristics, biochemical parameters, renal function indices, and cardiac function-related metrics, were collected for analysis. Initially, univariate logistic regression analysis was performed to screen for variables associated with the type of AF. Subsequently, the Least Absolute Shrinkage and Selection Operator (LASSO) regression was applied for further feature selection to reduce model complexity and prevent overfitting. A multivariate logistic regression model was then constructed to identify factors independently associated with persistent AF. Ultimately, utilizing the bootstrap resampling method, the significant variables were incorporated into six machine learning algorithms—Random Forest (RF), Decision Tree (DT), Naive Bayes (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and eXtreme Gradient Boosting (XGB)—to establish classification models. The discriminative performance of these models was evaluated using Receiver Operating Characteristic (ROC) curves. Finally, the SHapley Additive exPlanations (SHAP) method was employed to evaluate the contribution of each variable to the classification models. Results A total of 6,938 patients were enrolled, including 5,085 with paroxysmal AF and 1,853 with persistent AF. Univariate logistic regression analysis initially identified 19 statistically significant independent variables. Following LASSO regression selection, 13 key variables were retained for multivariate logistic regression analysis. The results indicated that the following were independently associated with persistent AF (all P<0.05): male sex (OR=1.248, 95%CI=1.086-1.435), surgical history (OR=0.809, 95%CI=0.706-0.926), BMI (OR=1.028, 95%CI=1.012-1.045), mean platelet volume (MPV) (OR=1.121, 95%CI=1.059-1.186), serum magnesium (Mg_plus2 ) (OR=0.098, 95%CI=0.046-0.208), cardiac output (CO) (OR=1.115, 95%CI=1.009-1.233), left ventricular posterior wall thickness (LVPW) (OR=0.777, 95%CI=0.665-0.909), left atrial diameter (LAD) (OR=1.144, 95%CI=1.123-1.166), left ventricular ejection fraction (LVEF) (OR=0.955, 95%CI=0.938-0.972), right atrial diameter (RAD) (OR=1.031, 95%CI=1.005-1.057), triglycerides (TG) (OR=0.821, 95%CI=0.751-0.898), uric acid (UA) (OR=1.003, 95%CI=1.002-1.003), and left ventricular end-diastolic diameter (LVEDD) (OR=0.903, 95%CI=0.879-0.927). ROC curve analysis demonstrated that the XGB model achieved the best performance (mean AUC=0.823), followed by the SVM model (mean AUC=0.820) and the RF model (mean AUC=0.814). SHAP analysis of the XGB model revealed that LAD and RAD had the highest SHAP values, suggesting that atrial structural parameters exert the greatest influence on model classification. Conclusion Increased BMI, male sex, elevated MPV, higher UA, decreased TG levels, decreased Mg_plus2, reduced LVEF, decreased LVEDD, no surgical history, and enlarged RAD and LAD were all closely associated with the occurrence of persistent AF. These clinical parameters can be readily obtained through routine examinations, most of which are non-invasive, and may serve as important clinical indicators for identifying patients with persistent AF, thereby supporting early clinical risk stratification and the development of targeted intervention strategies.

Key words: Atrial fibrillation, Paroxysmal AF, Persistent AF, LASSO regression, Logistic regression

摘要: 背景 心房颤动(简称房颤)会严重影响患者的生活质量,导致大量致残率和死亡率,并增加医疗费用。目的 探讨持续性房颤的影响因素,并基于此构建判别模型。方法 纳入2012年4月—2023年9月在新疆医科大学第一附属医院就诊的阵发性与持续性房颤患者为观察对象。收集患者的一般资料、生化指标、肾功能指标、心脏功能相关指标等进行分析。首先采用单因素Logistic回归分析筛选与房颤类型相关的变量,随后使用最小绝对收缩与选择算子(LASSO)回归方法进一步筛选特征变量,以降低模型复杂性并避免过拟合。随后构建多因素Logistic回归模型以识别与持续性房颤独立相关的影响因素。最终,采用Bootstrap重抽样方法,将显著变量纳入随机森林(RF)、决策树(DT)、朴素贝叶斯(NB)、支持向量机(SVM)、K近邻(KNN)及极限梯度提升算法(XGB)共6种机器学习判别模型,并通过受试者工作特征(ROC)曲线评估模型的判别能力。最后基于SHAP方法评估各变量对判别模型的贡献。结果 共收集到6 938例患者信息,其中包括5 085例阵发性房颤患者和1 853例持续性房颤患者。单因素Logistic回归分析共筛选出19个有统计学意义的自变量。经LASSO回归筛选,最终确定13个关键变量进入多因素Logistic回归分析,结果显示,性别(OR=1.248,95%CI=1.086~1.435)、个人手术史(OR=0.809,95%CI=0.706~0.926)、BMI(OR=1.028,95%CI=1.012~1.045)、平均血小板体积(MPV)(OR=1.121,95%CI=1.059~1.186)、血清镁(Mg_plus2)(OR=0.098,95%CI=0.046~0.208)、心输出量(CO)(OR=1.115,95%CI=1.009~1.233)、左心室后壁厚度(LVPW)(OR=0.777,95%CI=0.665~0.909)、左心房内径(LAD)(OR=1.144,95%CI=1.123~1.166)、左心室射血分数(LVEF)(OR=0.955,95%CI=0.938~0.972)、右心房内径(RAD)(OR=1.031,95%CI=1.005~1.057)、甘油三酯(TG)(OR=0.821,95%CI=0.751~0.898)、尿酸(UA)(OR=1.003,95%CI=1.002~1.003)、左心室舒张末期内径(LVEDD)(OR=0.903,95%CI=0.879~0.927)均为持续性房颤的独立影响因素(P<0.05)。ROC曲线分析结果显示,XGB模型表现最佳(平均AUC=0.823),其次为SVM模型(0.820)和RF模型(0.814)。评估各变量对XGB模型判别结果的相对贡献,结果显示,LAD与RAD的SHAP值最高,提示心房结构参数在判别模型中具有最显著的影响。结论 BMI升高、男性、MPV增高、UA升高、TG降低、Mg_plus2水平降低、LVEF降低、LVEDD减小、无个人手术史以及RAD和LAD增大,均与持续性房颤的发生密切相关。这些临床指标均可以通过常规检查手段获得,其中大多数为无创检查,能够作为判别持续性房颤患者的重要参考因素,辅助临床早期风险分层与干预策略的制定。

关键词: 心房颤动, 阵发性房颤, 持续性房颤, LASSO 回归, Logistic 回归

CLC Number: