Title | : | Segmentation of Speech into Phones Using Signal Processing Cues and HMMs in Tandem |
Speaker | : | Aswin Shanmugam S (IITM) |
Details | : | Tue, 28 Apr, 2015 3:00 PM @ BSB 361 |
Abstract: | : | The most popular method for automatic segmentation is embedded re-estimation of monophone hidden Markov models (HMM’s) after flat start initialization, followed by forced alignment. This method may not yield accurate boundaries. To address this issue, short-time energy (STE) and sub-band spectral flux are used as acoutic cues to correct syllable boundaries. Syllable is the fundamental unit of speech production. Time domain and spectral cues available in the speech signal are exploited to obtain syllable boundaries. The HMM based embedded reestimation is then restricted to the syllable boundaries as opposed to that of the entire speech utterance. Accurate monophone HMM models are thus obtained. Forced alignment is performed within the syllable to obtain phone level segmentation. Essentially signal processing for detecting syllable boundaries and HMM’s for acoustic modeling of phones work in tandem to obtain accurate segmentation at both phone and syllable levels. Considering phones and syllables as basic units, HMM based speech synthesis systems (HTS) are built using the proposed segmentation method for Tamil and Hindi. Listening tests indicate that the quality of synthesis improves significantly. |