Chinese General Practice ›› 2026, Vol. 29 ›› Issue (18): 2498-2503.DOI: 10.12114/j.issn.1007-9572.2025.0021

• Article • Previous Articles    

Accuracy of Artificial Intelligence in Remote Electrocardiography Diagnosis

  

  1. 1. Department of Echocardiogram and Electrocardiogram, Huangshan City People's Hospital, Huangshan 245000, China
    2. Huangshan Regional Remote Electrocardiography Diagnostic Center, Huangshan City People's Hospital, Huangshan 245000, China
  • Received:2025-01-13 Revised:2025-06-02 Published:2026-06-20 Online:2026-05-21
  • Contact: HU Min

远程心电人工智能诊断的准确性研究

  

  1. 1.245000 安徽省黄山市人民医院心电功能检查科
    2.245000 安徽省黄山市人民医院黄山市区域远程心电诊断中心
  • 通讯作者: 胡敏
  • 作者简介:

    作者贡献:

    胡敏进行研究方案设计、数据收集整理、统计分析、论文撰写,对文章整体负责;吕向东提出研究思路,审校论文,监督管理。

Abstract:

Background

Artificial intelligence (AI) technologies have shortcomings like vulnerability to adversarial attacks and overfitting, which make AI far less perfect in practical applications than experimental data. Considering factors such as the cost and timeliness of remote electrocardiography (ECG) consultation, primary medical staff will directly use AI diagnosis results, which may pose medical risks.

Objective

To analyze the accuracy and the influence factor of AI technology in ECG diagnosis based on the 3-year consultation data of Huangshan Regional Remote ECG Diagnostic Center.

Methods

A retrospective collection of 18 164 ECGs from primary care institutions was conducted at the Huangshan City Regional Remote ECG Diagnosis Center between September 2020 and September 2023. Both AI and physicians categorized the ECG diagnostic conclusions into four types: normal, positive, critical, and poor acquisition. Patient identity information was linked to the inpatient electronic medical record system of Huangshan City People's Hospital to extract discharge diagnosis information from tertiary hospitals. Patients were classified into two groups based on discharge diagnosis: cardiovascular disease (CVD) hospitalization and non-CVD hospitalization. A paired-design McNemar χ2 test was used to compare the classification differences between the AI and physician groups. A Pearson χ2 test was used to compare the differences between the classification results of both groups and CVD hospitalization status. After excluding cases with poor acquisition, a univariate logistic regression analysis was performed to analyze the correlation between different classifications and CVD hospitalization, using the normal category as a reference. Furthermore, 17 ECG indicators from the physician group (excluding poor acquisition cases) were converted into binary variables. Using the consistency of classification between the AI and physician groups as the dependent variable, receiver operating characteristic (ROC) curves were plotted for stratified analysis of the physician group's critical and positive categories to evaluate the impact of each ECG indicator on the inconsistency between the two groups.

Results

A total of 18 164 remote routine ECGs were included in the study. The median patient age was 69 (65, 74) years, with 8 731 males and 9 433 females. The physician group classified ECGs as normal in 5 873 cases (32.3%), positive in 11 678 (64.3%), critical in 393 (2.2%), and poor acquisition in 220 (1.2%). The corresponding figures for the AI group were 4 723 (26.0%), 12 861 (70.8%), 390 (2.1%), and 190 (1.0%), respectively. During the study period, 553 related patients were transferred to tertiary hospitals for inpatient care, of which 457 (82.6%) were for CVD. Univariate logistic regression analysis showed that, with normal ECG as a reference, the risk of CVD hospitalization for the physician group's positive and critical categories was 1.84 times (OR=1.84, 95%CI=1.11-3.04) and 2.80 times (OR=2.80, 95%CI=1.08-7.21) that of the normal category, respectively, with statistically significant differences (P<0.05). The AI group's positive (OR=1.54, 95%CI=0.88-2.67) and critical (OR=2.46, 95%CI=0.92-6.55) categories showed no statistically significant association with CVD hospitalization (P>0.05). The diagnostic classification was consistent between the AI and physician groups in 16 018 cases (88.2%) and inconsistent in 2 146 cases (11.8%), with a statistically significant difference between the two groups (χ2=680.931, P<0.001). Using the physician group as the standard, the AI group had a misdiagnosis rate of 27.7% and a missed diagnosis rate of 3.9%. ROC curve results indicated that for the physician group's critical ECGs, sinus rhythm, ST-segment abnormalities, and acute myocardial ischemia had discriminative value for the inconsistency between the two groups, with areas under the curve (AUC) of 0.74 (95%CI=0.65-0.82), 0.69 (95%CI=0.58-0.80), and 0.97 (95%CI=0.96-0.99), respectively. For the physician group's positive ECG, low voltage and T-wave abnormalities had discriminative value, with AUC of 0.58 (95%CI=0.55-0.61) and 0.61 (95%CI=0.58-0.63), respectively. For the physician group's normal ECG, bradycardia had discriminative value, with an AUC of 0.58 (95%CI=0.56-0.60).

Conclusion

The accuracy of current AI algorithm in ECG diagnosis is inferior to physician group, which still needs to be reviewed and confirmed by experienced physicians. We propose that AI technology applied in clinic should undergo extensive robustness verification.

Key words: Cardiovascular disease, Artificial intelligence, Remote electrocardiography, Diagnostic techniques

摘要:

背景

人工智能(AI)技术存在易受对抗性攻击和过拟合现象,在实践应用中效果不及实验数据完美。考虑到远程心电图会诊收费和时效性等因素,基层医务人员会直接采用AI诊断结果,可能存在医疗风险。

目的

应用黄山市区域远程心电诊断中心3年的会诊数据,以医师诊断为标准,分析AI诊断的准确性及其影响因素。

方法

回顾性收集2020年9月—2023年9月黄山市区域远程心电诊断中心来自基层医疗机构的全部会诊心电图18 164份。分别将AI和医师心电图诊断结论分为正常、阳性、危急及采集不良4类,并将患者身份信息与同期黄山市人民医院住院电子病历系统关联,提取相关患者在三级医院的出院诊断信息,按出院诊断分为心血管疾病(CVD)住院和非CVD住院两类。采用配对设计的McNemar χ2检验比较AI组与医师组分类差异,采用Pearson χ2检验比较两组分类结果与是否因CVD住院的差异性,剔除采集不良病例后,以正常类为参照,采用单因素Logistic回归分析探讨不同分类与CVD住院的相关性。进一步将医师组非采集不良心电图的17个心电图指标转化为二分类变量,以AI组与医师组分类是否一致为因变量,绘制ROC曲线对医师组危急类、阳性类进行分层分析,评价各心电图指标对两组分类不一致的影响。

结果

共纳入远程常规心电图18 164份,患者中位年龄为69(65,74)岁,其中男8 731例、女9 433例。医师组分类为正常5 873份(32.3%)、阳性11 678份(64.3%)、危急393份(2.2%)、采集不良220份(1.2%);AI组相应分别为4 723份(26.0%)、12 861份(70.8%)、390份(2.1%)和190份(1.0%)。研究期间相关患者同期转入三级医院住院553人次,其中CVD住院457人次(82.6%)。单因素Logistic回归分析显示,以正常心电图为参照,医师组阳性类和危急类因CVD住院风险分别为正常类的1.84倍(OR=1.84,95%CI=1.11~3.04)和2.80倍(OR=2.80,95%CI=1.08~7.21)(P<0.05);AI组阳性类(OR=1.54,95%CI=0.88~2.67)和危急类(OR=2.46,95%CI=0.92~6.55)与CVD住院无统计学关联(P>0.05)。AI组与医师组诊断分类一致16 018份(88.2%),不一致2 146份(11.8%),两组差异有统计学意义(χ2=680.931,P<0.001);以医师组为标准,AI组误诊率为27.7%,漏诊率为3.9%。ROC曲线结果显示,在医师组危急类心电图中,窦性心律、ST段异常及急性心肌缺血对两组分类不一致有判别价值,ROC曲线下面积(AUC)分别为0.74(95%CI=0.65~0.82)、0.69(95%CI=0.58~0.80)和0.97(95%CI=0.96~0.99);在医师组阳性类心电图中,低电压和T波异常对两组分类不一致有判别价值,AUC分别为0.58(95%CI=0.55~0.61)和0.61(95%CI=0.58~0.63);在医师组正常类心电图中,心动过缓对两组分类不一致有判别价值,AUC为0.58(95%CI=0.56~0.60)。

结论

当前的AI算法对心电图诊断的准确性低于医师组,仍需有经验的医师审核确认。提示应用于临床的AI技术应进行广泛的稳健性验证。

关键词: 心血管疾病, 人工智能, 远程心电图, 诊断技术