BCD – 嬰兒哭聲

by allenlu2007

Audio signal detection 的一個實際的應用就是 BCD.  全世界的嬰兒哭聲基本都類似, hardwire 在大腦中。

Reference:   Infant Cry Analysis and Detection.

Cry samples: http://sirkan.iit.bme.hu/~varallyay/crysamples.htm

下圖是大致流程圖。audio signal => VAD => Framing => MFCC => k-NN


Audio Source

Time domain and short time spectrogram 如下圖。幾個特點

* Time domain – Burst signals

* Frequency domain – Rich harmonics and frequency chirping


Feature Extraction

1. Pitch frequency – 也就是 fundamental frequency f0  因爲嬰兒發聲的物理特性會有 prior information, 可以用來分類。

2. Short-time energy (STE) – 可以用來作爲 VAD (voice activity detection).


3. Mel-Frequency Cepstrum Coefficients (MFCC) – 如前文,對於 harmonics rich audio signal (e.g. voice) 應該很有用。


4. Harmonicity Factor (HF) – 定義如 reference, confusing to me.


5. Harmonic to average power ratio (HARP) – 類似 time domain peak-to-average power ratio. 

下圖 show 1 (fo)/4 (HF)/5 (HARP).  HF 好像是 frequency? very confusing to me.


6. Burst frequency – 嬰兒哭聲一般有周期性。如 waveform and spectrogram 所示。  但在嘈雜的環境,不能衹用 power.  還是要用 frequency spectrum (DFT) 的最大值。

7. Rise-time and Fall-time of the short-time energy – STE 的 rise time and fall time.

BCD Algorithm

三個主要的 algorithms (i) Voice Activity Detection (VAD); (ii) Classification: use k-NN algorithm.  ‘Cry’ (1) and ‘No cry’ (0);  (iii) post-processing to reduce the false alarm.

VAD: voice signal is divided into consecutive and overlapping segments, each of 10sec, with a step of 1second.