中国全科医学 ›› 2025, Vol. 28 ›› Issue (29): 3638-3644.DOI: 10.12114/j.issn.1007-9572.2025.0238

• 专题研究·吞咽障碍 • 上一篇    下一篇

吞咽声学数据库构建技术与方法探索

李丹, 刘涛, 罗维, 宋红丹, 尚少梅*()   

  1. 100191 北京市,北京大学护理学院
  • 收稿日期:2025-07-10 修回日期:2025-08-01 出版日期:2025-10-15 发布日期:2025-08-26
  • 通讯作者: 尚少梅

  • 作者贡献:

    李丹提出主要研究目标,负责研究的构思与设计,研究的实施,撰写论文。刘涛负责现场招募、数据采集与整理,协助音频信号处理、特征提取与标注,修改论文。罗维、宋红丹负责现场招募、数据采集与整理。尚少梅负责文章的质量控制与审查,对文章整体负责,监督管理。

  • 基金资助:
    北京大学医学部"医学+X"项目(BMU2024YXXLHGG005); 国家重点研发计划(2020YFC2008800,2020YFC2008801)

Exploration of Technologies and Methods for Constructing a Swallowing Acoustic Database

LI Dan, LIU Tao, LUO Wei, SONG Hongdan, SHANG Shaomei*()   

  1. School of Nursing, Peking University, Beijing 100191, China
  • Received:2025-07-10 Revised:2025-08-01 Published:2025-10-15 Online:2025-08-26
  • Contact: SHANG Shaomei

摘要: 吞咽障碍在老年群体中发病率高,若未能及时识别与干预,易引发误吸、营养不良及肺部感染等严重并发症。近年来,基于声学特征的吞咽功能评估因其非侵入性、可操作性强及适用于远程监测等优势,受到广泛关注。然而,现有研究普遍存在样本量小、音频类型单一、采集与处理标准不统一等问题,制约了声学技术在吞咽障碍识别中的深入应用。本研究在北京市和石家庄市13家养老机构中招募650名受试者,纳入635名合格受试者,共采集7 922条涵盖吞咽音、咳嗽音与语音的有效音频。每条音频提取23个声学特征,涵盖时域、频域、能量及非线性4个维度,共提取182 206个声学特征。基于波形图、时频图与频谱图分析,初步验证了不同音频事件在多维度声学特征上的显著差异。最终,本研究开发了一套标准化的吞咽声学数据采集与处理流程,构建了覆盖多类型音频事件与多维度声学特征的吞咽声学数据库,为后续声学标志物识别、智能识别模型构建、远程吞咽功能评估系统开发等提供了数据支撑,具有重要科研价值与广阔应用前景。

关键词: 吞咽障碍, 吞咽困难, 老年人, 吞咽音, 咳嗽音, 语音, 声学特征, 数据库

Abstract:

Dysphagia is common among elderly people and may lead to aspiration, malnutrition, and pulmonary infections if not properly managed. Acoustic-based assessment offers a non-invasive, practical, and remotely applicable approach, yet current research is limited by small sample sizes and a lack of standardized data protocols. This study recruited 650 older adults from 13 care institutions in Beijing and Shijiazhuang, with 635 completing valid audio tasks. A total of 7 922 high-quality recordings were collected, including swallowing, coughing, and speech sounds. From each audio clip, 23 acoustic features across time, frequency, energy, and nonlinear domains were extracted, yielding 182 206 feature data points. Waveform, spectrogram, and time-frequency analyses confirmed significant differences across sound types, highlighting the discriminative value of acoustic features. A standardized workflow for audio collection, processing, and feature extraction was developed, resulting in a comprehensive swallowing acoustic database. This database provides essential support for recognizing acoustic biomarkers, building AI-driven identification models and advancing remote dysphagia assessment. It has significant scientific research value and broad application prospects.

Key words: Deglutition disorders, Dysphagia, Aged, Swallowing sounds, Coughing sounds, Speech sounds, Acoustic features, Database