Accuracy of Artificial Intelligence in Remote Electrocardiography Diagnosis

doi:10.12114/j.issn.1007-9572.2025.0021

Abstract

Abstract:

Background

Artificial intelligence (AI) technologies have shortcomings like vulnerability to adversarial attacks and overfitting, which make AI far less perfect in practical applications than experimental data. Considering factors such as the cost and timeliness of remote electrocardiography (ECG) consultation, primary medical staff will directly use AI diagnosis results, which may pose medical risks.

Objective

To analyze the accuracy and the influence factor of AI technology in ECG diagnosis based on the 3-year consultation data of Huangshan Regional Remote ECG Diagnostic Center.

Methods

A retrospective collection of 18 164 ECGs from primary care institutions was conducted at the Huangshan City Regional Remote ECG Diagnosis Center between September 2020 and September 2023. Both AI and physicians categorized the ECG diagnostic conclusions into four types: normal, positive, critical, and poor acquisition. Patient identity information was linked to the inpatient electronic medical record system of Huangshan City People's Hospital to extract discharge diagnosis information from tertiary hospitals. Patients were classified into two groups based on discharge diagnosis: cardiovascular disease (CVD) hospitalization and non-CVD hospitalization. A paired-design McNemar χ² test was used to compare the classification differences between the AI and physician groups. A Pearson χ² test was used to compare the differences between the classification results of both groups and CVD hospitalization status. After excluding cases with poor acquisition, a univariate logistic regression analysis was performed to analyze the correlation between different classifications and CVD hospitalization, using the normal category as a reference. Furthermore, 17 ECG indicators from the physician group (excluding poor acquisition cases) were converted into binary variables. Using the consistency of classification between the AI and physician groups as the dependent variable, receiver operating characteristic (ROC) curves were plotted for stratified analysis of the physician group's critical and positive categories to evaluate the impact of each ECG indicator on the inconsistency between the two groups.

Results

A total of 18 164 remote routine ECGs were included in the study. The median patient age was 69 (65, 74) years, with 8 731 males and 9 433 females. The physician group classified ECGs as normal in 5 873 cases (32.3%), positive in 11 678 (64.3%), critical in 393 (2.2%), and poor acquisition in 220 (1.2%). The corresponding figures for the AI group were 4 723 (26.0%), 12 861 (70.8%), 390 (2.1%), and 190 (1.0%), respectively. During the study period, 553 related patients were transferred to tertiary hospitals for inpatient care, of which 457 (82.6%) were for CVD. Univariate logistic regression analysis showed that, with normal ECG as a reference, the risk of CVD hospitalization for the physician group's positive and critical categories was 1.84 times (OR=1.84, 95%CI=1.11-3.04) and 2.80 times (OR=2.80, 95%CI=1.08-7.21) that of the normal category, respectively, with statistically significant differences (P<0.05). The AI group's positive (OR=1.54, 95%CI=0.88-2.67) and critical (OR=2.46, 95%CI=0.92-6.55) categories showed no statistically significant association with CVD hospitalization (P>0.05). The diagnostic classification was consistent between the AI and physician groups in 16 018 cases (88.2%) and inconsistent in 2 146 cases (11.8%), with a statistically significant difference between the two groups (χ²=680.931, P<0.001). Using the physician group as the standard, the AI group had a misdiagnosis rate of 27.7% and a missed diagnosis rate of 3.9%. ROC curve results indicated that for the physician group's critical ECGs, sinus rhythm, ST-segment abnormalities, and acute myocardial ischemia had discriminative value for the inconsistency between the two groups, with areas under the curve (AUC) of 0.74 (95%CI=0.65-0.82), 0.69 (95%CI=0.58-0.80), and 0.97 (95%CI=0.96-0.99), respectively. For the physician group's positive ECG, low voltage and T-wave abnormalities had discriminative value, with AUC of 0.58 (95%CI=0.55-0.61) and 0.61 (95%CI=0.58-0.63), respectively. For the physician group's normal ECG, bradycardia had discriminative value, with an AUC of 0.58 (95%CI=0.56-0.60).

Conclusion

The accuracy of current AI algorithm in ECG diagnosis is inferior to physician group, which still needs to be reviewed and confirmed by experienced physicians. We propose that AI technology applied in clinic should undergo extensive robustness verification.

Key words: Cardiovascular disease, Artificial intelligence, Remote electrocardiography, Diagnostic techniques

摘要：

背景

人工智能（AI）技术存在易受对抗性攻击和过拟合现象，在实践应用中效果不及实验数据完美。考虑到远程心电图会诊收费和时效性等因素，基层医务人员会直接采用AI诊断结果，可能存在医疗风险。

目的

应用黄山市区域远程心电诊断中心3年的会诊数据，以医师诊断为标准，分析AI诊断的准确性及其影响因素。

方法

回顾性收集2020年9月—2023年9月黄山市区域远程心电诊断中心来自基层医疗机构的全部会诊心电图18 164份。分别将AI和医师心电图诊断结论分为正常、阳性、危急及采集不良4类，并将患者身份信息与同期黄山市人民医院住院电子病历系统关联，提取相关患者在三级医院的出院诊断信息，按出院诊断分为心血管疾病（CVD）住院和非CVD住院两类。采用配对设计的McNemar χ²检验比较AI组与医师组分类差异，采用Pearson χ²检验比较两组分类结果与是否因CVD住院的差异性，剔除采集不良病例后，以正常类为参照，采用单因素Logistic回归分析探讨不同分类与CVD住院的相关性。进一步将医师组非采集不良心电图的17个心电图指标转化为二分类变量，以AI组与医师组分类是否一致为因变量，绘制ROC曲线对医师组危急类、阳性类进行分层分析，评价各心电图指标对两组分类不一致的影响。

结果

共纳入远程常规心电图18 164份，患者中位年龄为69（65，74）岁，其中男8 731例、女9 433例。医师组分类为正常5 873份（32.3%）、阳性11 678份（64.3%）、危急393份（2.2%）、采集不良220份（1.2%）；AI组相应分别为4 723份（26.0%）、12 861份（70.8%）、390份（2.1%）和190份（1.0%）。研究期间相关患者同期转入三级医院住院553人次，其中CVD住院457人次（82.6%）。单因素Logistic回归分析显示，以正常心电图为参照，医师组阳性类和危急类因CVD住院风险分别为正常类的1.84倍（OR=1.84，95%CI=1.11~3.04）和2.80倍（OR=2.80，95%CI=1.08~7.21）（P<0.05）；AI组阳性类（OR=1.54，95%CI=0.88~2.67）和危急类（OR=2.46，95%CI=0.92~6.55）与CVD住院无统计学关联（P>0.05）。AI组与医师组诊断分类一致16 018份（88.2%），不一致2 146份（11.8%），两组差异有统计学意义（χ²=680.931，P<0.001）；以医师组为标准，AI组误诊率为27.7%，漏诊率为3.9%。ROC曲线结果显示，在医师组危急类心电图中，窦性心律、ST段异常及急性心肌缺血对两组分类不一致有判别价值，ROC曲线下面积（AUC）分别为0.74（95%CI=0.65~0.82）、0.69（95%CI=0.58~0.80）和0.97（95%CI=0.96~0.99）；在医师组阳性类心电图中，低电压和T波异常对两组分类不一致有判别价值，AUC分别为0.58（95%CI=0.55~0.61）和0.61（95%CI=0.58~0.63）；在医师组正常类心电图中，心动过缓对两组分类不一致有判别价值，AUC为0.58（95%CI=0.56~0.60）。

结论

当前的AI算法对心电图诊断的准确性低于医师组，仍需有经验的医师审核确认。提示应用于临床的AI技术应进行广泛的稳健性验证。

关键词: 心血管疾病, 人工智能, 远程心电图, 诊断技术

HU Min,LYU Xiangdong. Accuracy of Artificial Intelligence in Remote Electrocardiography Diagnosis[J]. Chinese General Practice, 2026, 29(18): 2498-2503. DOI: 10.12114/j.issn.1007-9572.2025.0021.
胡敏,吕向东. 远程心电人工智能诊断的准确性研究[J]. 中国全科医学, 2026, 29(18): 2498-2503. DOI: 10.12114/j.issn.1007-9572.2025.0021.

Figures/Tables 4

References 18

[1]	ARMOUNDAS A A, NARAYAN S M, ARNETT D K, et al. Use of artificial intelligence in improving outcomes in heart disease: a scientific statement from the American heart association[J]. Circulation, 2024, 149(14): e1028-e1050. DOI: 10.1161/CIR.0000000000001201.
[2]	KLIGFIELD P, GETTES L S, BAILEY J J, et al. Recommendations for the standardization and interpretation of the electrocardiogram: part Ⅰ: the electrocardiogram and its technology a scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College of Cardiology Foundation; and the Heart Rhythm Society endorsed by the International Society for Computerized Electrocardiology[J]. J Am Coll Cardiol, 2007, 49(10): 1109-1127. DOI: 10.1161/CIRCULATIONAHA.106.180200.
[3]	MESKÓ B, GÖRÖG M. A short guide for medical professionals in the era of artificial intelligence[J]. NPJ Digit Med, 2020, 3: 126. DOI: 10.1038/s41746-020-00333-z.
[4]	HUNTER D J, HOLMES C. Where medical statistics meets artificial intelligence[J]. N Engl J Med, 2023, 389(13): 1211-1219. DOI: 10.1056/NEJMra2212850.
[5]	张海澄, 余新艳, 王红宇, 等. 远程心电筛查助力分级诊疗的管理难点及瓶颈[J]. 中国全科医学, 2023, 26(5): 525-531, 540. DOI: 10.12114/j.issn.1007-9572.2022.L0002.
[6]	余新艳, 顾志乐, 张晓娟, 等. 人工智能在远程心电云平台辅助决策基层危急值心电图中的应用价值研究[J]. 中国全科医学, 2022, 25(11): 1363-1367, 1372. DOI: 10.12114/j.issn.1007-9572.2021.01.411.
[7]	心电图危急值2017中国专家共识——中国心电学会危急值专家工作组[J]. 临床心电学杂志, 2017, 26(6): 401-402.
[8]	全军心血管专业委员会心脏无创检测学组《心电图诊断术语规范化中国专家共识》编写专家组, 郭继鸿, 王思让, 等. 心电图诊断术语规范化中国专家共识(2019)[J]. 实用心电学杂志, 2019, 28(3): 161-165. DOI: 10.13308/j.issn.2095-9354.2019.03.002.
[9]	《远程心电图诊断危险分级中国专家共识》编写专家组, 郭继鸿, 陈韵岱, 等. 远程心电图危险分级诊断的中国专家共识[J]. 临床心电学杂志, 2022, 31(6): 401-405.
[10]	国家心血管病中心. 中国心血管健康与疾病报告-2023, 2023[M]. 北京: 中国协和医科大学出版社, 2024: 1-12.
[11]	BACHAROVA L, CHEVALIER P, GORENEK B, et al. ISE/ISHNE expert consensus statement on the ECG diagnosis of left ventricular hypertrophy: the change of the paradigm[J]. Ann Noninvasive Electrocardiol, 2024, 29(1): e13097. DOI: 10.1111/anec.13097.
[12]	HABIBI M, CHAHAL H, GREENLAND P, et al. Resting heart rate, short-term heart rate variability and incident atrial fibrillation (from the multi-ethnic study of atherosclerosis (MESA))[J]. Am J Cardiol, 2019, 124(11): 1684-1689. DOI: 10.1016/j.amjcard.2019.08.025.
[13]	WANG L, DUAN C Y, MA L. Is it really alternating bundle-branch block?[J]. JAMA Intern Med, 2022, 182(11): 1208-1209. DOI: 10.1001/jamainternmed.2022.3703.
[14]	AL HINAI G, JAMMOUL S, VAJIHI Z, et al. Deep learning analysis of resting electrocardiograms for the detection of myocardial dysfunction, hypertrophy, and ischaemia: a systematic review[J]. Eur Heart J Digit Health, 2021, 2(3): 416-423.
[15]	周伊恒, 杨梓钰, 吕垚, 等. 美国心脏协会指南解读系列——《人工智能在心血管疾病中的应用科学声明》解读[J]. 中国全科医学, 2024, 27(35): 4353-4357. DOI: 10.12114/j.issn.1007-9572.2024.0192.
[16]	BAZOUKIS G, HALL J, LOSCALZO J, et al. The inclusion of augmented intelligence in medicine: a framework for successful implementation[J]. Cell Rep Med, 2022, 3(1): 100485. DOI: 10.1016/j.xcrm.2021.100485.
[17]	AL-ZAITI S S, ALGHWIRI A A, HU X, et al. A clinician's guide to understanding and critically appraising machine learning studies: a checklist for Ruling Out Bias Using Standard Tools in Machine Learning (ROBUST-ML)[J]. Eur Heart J Digit Health, 2022, 3(2): 125-140. DOI: 10.1093/ehjdh/ztac016.
[18]	HERMAN R, DEMOLDER A, VAVRIK B, et al. Validation of an automated artificial intelligence system for 12-lead ECG interpretation[J]. J Electrocardiol, 2024, 82: 147-154. DOI: 10.1016/j.jelectrocard.2023.12.009.

AI诊断	医师诊断				合计
AI诊断	正常	采集不良	阳性	危急	合计
正常	80（14.4）	0	8（1.5）	0	88（15.9）
采集不良	0	5（0.9）	0	0	5（0.9）
阳性	32（5.8）	0	374（67.6）	1（0.2）	407（73.6）
危急	0	0	0	53（9.6）	53（9.6）
合计	112（20.2）	5（0.9）	382（69.1）	54（9.8）	553（100.0）

AI诊断	医师诊断				合计
AI诊断	正常	采集不良	阳性	危急	合计
正常	80（14.4）	0	8（1.5）	0	88（15.9）
采集不良	0	5（0.9）	0	0	5（0.9）
阳性	32（5.8）	0	374（67.6）	1（0.2）	407（73.6）
危急	0	0	0	53（9.6）	53（9.6）
合计	112（20.2）	5（0.9）	382（69.1）	54（9.8）	553（100.0）

AI诊断	医师诊断				合计
AI诊断	正常	采集不良	阳性	危急	合计
正常	4 247（23.4）	7（0.0）	469（2.6）	0	4 723（26.0）
采集不良	0	190（1.0）	0	0	190（1.0）
阳性	1 624（8.9）	11（0.1）	11 207（61.7）	19（0.1）	12 861（70.8）
危急	2（0.0）	12（0.1）	2（0.0）	374（2.1）	390（2.1）
合计	5 873（32.3）	220（1.2）	11 678（64.3）	393（2.2）	18 164（100.0）

AI诊断	医师诊断				合计
AI诊断	正常	采集不良	阳性	危急	合计
正常	4 247（23.4）	7（0.0）	469（2.6）	0	4 723（26.0）
采集不良	0	190（1.0）	0	0	190（1.0）
阳性	1 624（8.9）	11（0.1）	11 207（61.7）	19（0.1）	12 861（70.8）
危急	2（0.0）	12（0.1）	2（0.0）	374（2.1）	390（2.1）
合计	5 873（32.3）	220（1.2）	11 678（64.3）	393（2.2）	18 164（100.0）

心电图指标	危急类AUC（95%CI）	阳性类AUC（95%CI）
窦性心律	0.74（0.65~0.82）^a	0.52（0.50~0.55）
心动过速	0.21（0.15~0.27）^a	0.48（0.46~0.51）
心动过缓	0.55（0.41~0.70）	0.38（0.36~0.40）^a
期前收缩	0.55（0.41~0.68）	0.43（0.40~0.45）^a
心房颤动	0.28（0.19~0.37）^a	0.48（0.46~0.51）
房室传导异常	0.53（0.39~0.67）	0.45（0.42~0.47）^a
室内传导异常	0.48（0.35~0.61）	0.45（0.42~0.47）^a
QRS电轴	0.56（0.42~0.70）	0.49（0.46~0.51）
低电压	0.46（0.34~0.59）	0.58（0.55~0.61）^a
心腔肥大	0.53（0.39~0.66）	0.39（0.37~0.41）^a
异常Q波	0.50（0.37~0.63）	0.49（0.46~0.51）
ST段异常	0.69（0.58~0.80）^a	0.46（0.44~0.49）^a
T波异常	0.61（0.49~0.73）	0.61（0.58~0.63）^a
QT异常	0.49（0.36~0.63）	0.50（0.47~0.52）
急性心肌缺血	0.97（0.96~0.99）^a	0.50（0.47~0.53）
早复极	0.50（0.36~0.63）	0.49（0.46~0.52）
其他	0.60（0.45~0.75）	0.51（0.48~0.54）