Signalrepresentasjoner for automatisk talegjenkjenning

FFI-Rapport 2005

Om publikasjonen

ISBN

9788246409368

8246409360

Størrelse

843.4 KB

Språk

Norsk

Talegjenkjenning

Last ned publikasjonen

Marius Gamborg Frode Lillevold

In this report we give an overwiev of methods for front-end processing of speech signals for automatic speech recognition (ASR) that are described in the litterature. The most common representation of speech in this context seems to be mel-frequency cepstral coeficient (MFCC) with delta- and double-delta coefficients, usually combined with cepstral mean normalization (CMN). Other representations include perceptual linear prediction (PLP) and linear prediction cepstral coefficients (LPCC).

Talegjenkjenning