Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

RA4: Automatic error prediction and dedicated signal modification

There is always a danger in concatenation-based speech synthesis that an artifact occurs at a concatenation point, even when phonetically motivated optimizations described in RA3 are proposed. This is caused by the limited size of speech unit database relative to the natural variability of speech. Though it is widely accepted that the best quality in unit selection is achieved when no signal modification is carried out at all, we believe selective signal modification targeted at the specific component of unit selection which causes the artifact can suppress it. Based on the analysis of artifacts in synthetic speech carried out in RA1, an error prediction module will be designed to predict potential artifacts (e.g. F0 discontinuity) in to-be-synthesized speech during the unit-selection runtime (RA4a) [LU10], [VIT13], [LEG13]. According to the type of the predicted artifact, dedicated signal modification (e.g. F0 smoothing) will be carried out (RA4b). Since a combination of unit selection and HMM-based speech synthesis were reported to be helpful in literature (e.g. [BLA07], [SIL10]), hybrid approaches will be examined as well (RA4c). The possibility to generate speech from HMMs when the unit-selection scheme would result in an artifact will also be researched, and a compromise between using the selected (i.e. natural) speech segments (which can, however, result in discontinuities and disruptive artifacts) and generated segments (either by dedicated signal modification technique or by HMM- based synthesis) will be sought. The compromise should balance mixing the selected and smoothed/generated speech, possibly with a configurable scheme according to listeners’ preference (RA4d).

Activity Objective Workplace 2016 2017 2018 Dissemination
RA4a Automatic error prediction UWB x x Jimp: 1, D: 6
RA4b Dedicated signal modification UWB x x
RA4c Hybrid approaches UWB x x
RA4d Compromise between selected and generated speech UWB x