F0 difference in 1st and 2nd syllables of prosodic words

related to task #4252

Theoretical background

Rule: The mean F0 value in the 1st syllable in prosodic words (not final in phrases) shoud be lower than the mean F0 value in the 2nd syllable.

Corpora statistics

Large TTS voice corpora were used to verify the theoretical rule.
Only 'oznam' sentences were taken into account so far (prosodic words with prosodeme -1 and 0).
Note: Prosodeme -1 is used for first prosodic words in each phrase (i.e. at the sentence beginning and after a pause).

Condition used:
IF meanF0_2nd - meanF0_1st >= - 0.03 * meanF0_1st THEN prosodic word is OK

(if the decrease of meanF0 in the second syllable is not higher than 3%, it is considered to be OK)
meanFO_1st - mean F0 value in the 1st vowel
meanF0_2nd - mean F0 value in the 2nd vowel

The following table shows how many percent of prosodic words match the rule above (these considered to be OK).

TTS voice prosodeme prosodic words OK meanF0 difference standard deviation
spkr_AJ -1 81.62 % 13.33 19.03
spkr_AJ 0 79.93 % 7.24 14.92
spkr_KI -1 82.51 % 9.64 18.26
spkr_KI 0 77.74 % 4.04 17.69

The files outAJ_reaper_new.txt and outKI_reaper_new.txt contain the list of cases not matching the rule (1st syllable has higher meanF0 than 2nd), the structure of the file is as follows:

sentence_name prosodeme_type prosodic_word first_vowel_beg_time second_vowel_beg_time mean_F0_diff

Note: Some "errors" could be caused by inappropriate prosodic words boundaries.
Note2: Some "strange" F0 values were found -> another method for computing F0 values (REAPER) was used for the statistics above.

In total, only 45% of 'oznam' sentences in spkr_AJ corpus and 37% of 'oznam' sentences in spkr_KI corpus have no "error".
If we tolerate 1 "error" in a sentence, there are 93% of correct sentences for spkr_AJ and 84% for spkr_KI.

Experiment - new feature in target/concatenation cost