Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

Actions

Task #3972

closed

Task #3677: RA3b - Phonetically justified parameters (spectral tilt, ...)

Task #3970: Formant-based join cost computation

Use formants instead of MFCCs in join cost computation

Added by Matoušek Jindřich almost 8 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Start date:
18.07.2016
Due date:
14.09.2016
% Done:

0%

Estimated time:

Description

Use formants from #3971 and replace MFCCs.

Compute both static and dynamic contours around a concatenation point (similarly as for F0).

Alternatively, formants estimated by ESPS tool (used e.g. in Wavesurfer) could be used. For the voice Jan (AJ), the formants are stored in ARTIC/Projects/cz/spkr_AJ/data/non-mastered/zkracene-pauzy/param/formants/formants_f10_o12_w25_i05. More details about the format of text files with formant values are given in ARTIC/Projects/cz/spkr_AJ/data/non-mastered/zkracene-pauzy/param/formants/README.txt


Related issues

Blocked by HQSYN16 - Task #3999: Compute the formatsClosedTihelka Dan09.08.201614.08.2016

Actions
Follows HQSYN16 - Task #3971: Praat script to compute formantsClosedBořil Tomáš04.07.201617.07.2016

Actions
Actions #1

Updated by Matoušek Jindřich almost 8 years ago

  • Follows Task #3971: Praat script to compute formants added
Actions #2

Updated by Matoušek Jindřich almost 8 years ago

Tomáš Bořil added Praat scripts to compute formants - see #3971

Actions #3

Updated by Tihelka Dan over 7 years ago

  • Blocked by Task #3999: Compute the formats added
Actions #4

Updated by Tihelka Dan over 7 years ago

I have hacked up the use of formats instead of MFCC in the concatenation cost (in addition to F0 and energy, which are computed as before).

Distances

For further explanation, expect F1{t}, _F2{t}, F3{t} and F4{t} being (z-score normalized) values of formants at time t. When t = eL, than it describes the time nearest to the end of the left concatenated diphone, and t = bR describes the time nearest to the beginning of the concatenated right diphone; i.e. we always examine the difference of eL to bR features (being taken from a phone center). When the concatenated diphones neighbored in the corpus, then it is ensured that eL = bR.

Now it is possible to experiment with 3 computation schema:
  • absolute difference of formants and their slopes:
     
    cost = (abs(F1{eL} - F1{bR}) * W1 + abs(F2{eL} - F2{bR}) * W2 + ... + abs(F4{eL} - F4{bR}) * W4 + abs(S1{eL} - S1{bR}) * W1 + abs(S2{eL} - S2{bR}) * W2 + ... + abs(S4{eL} - S4{bR}) * W4 + F0-cost + energy-cost) / (W1 + W2 + W3 + W4 + 1 + 1)
     
    where Sn is slope of the n-th format computed from sequence of [ Fn{t-4}, Fn{t-3}, Fn{t-2}, Fn{t-1}, Fn{t}, Fn{t+1}, Fn{t+2}, Fn{t+3}, Fn{t+4}] formant values, t = eL or t = bR.
  • Euclidean distance of the formant contour:
     
    cost = (euclid(C1{eL}, C1{bR}) * W1 + euclid(C2{eL}, C2{bR}) * W2 + ... + euclid(C4{eL}, C4{bR}) * W4 + F0-cost + energy-cost) / (W1 + W2 + W3 + W4 + 1 + 1)
     
    where Cn{t} = [ Fn{t-4}, Fn{t-3}, Fn{t-2}, Fn{t-1}, Fn{t}, Fn{t+1}, Fn{t+2}, Fn{t+3}, Fn{t+4}] is the sequence of formant values
  • Mean absolute difference of the formant contour, which is the same as the previous, but except the euclid(Cn{eL}, Cn{bR}) distance we use mean(abs(Cn{eL} - Cn{bR}))

For all the experiments, the weights were set to: W1 = 0.8, W2 = 1.0, W3 = 0.7, and W4 = 0.4. Also, there is no bandwidth considered now!

Formants

There are 2 versions of formant estimations: ESPS+PRAAT (which gives us 6 possible experiments)

What next

Now there are several questions to answer:

  1. how to design the experiment
  2. what text use to experiment
  3. which distance computation scheme to use
  4. which formants to use (I would vote for PRAAT)

Any thoughts?

Actions #5

Updated by Tihelka Dan over 7 years ago

  • Status changed from New to Feedback
Actions #6

Updated by Tihelka Dan over 7 years ago

  • Assignee changed from Tihelka Dan to Matoušek Jindřich
Actions #7

Updated by Tihelka Dan over 7 years ago

As the first experiment, I will use PRAAT-computed formants with all the distance computations (i.e. ABS, EUCL, SLP) to get the log of units usage. That can be compared to the units used for baseline system (without formants in CC) and the most differing unit sequences can be further analysed. This is the same approach as in #3941.

Note that the cost is computed through the window of 9 values, which is about 60msec of signal (for frame length 20msec and shift 5msec). Is that enough to capture the interesting formant properties? Or should the region be extended?

Actions #8

Updated by Matoušek Jindřich almost 7 years ago

  • Status changed from Feedback to Closed

Replacement of MFCCs with formant frequencies was not very successful, see #4176. Other (supplementary) measures (like spectral slope) will be searched for.

Actions

Also available in: Atom PDF