I have hacked up the use of formats instead of MFCC in the concatenation cost (in addition to F0 and energy, which are computed as before).
Distances¶
For further explanation, expect F1{t}, _F2{t}, F3{t} and F4{t} being (z-score normalized) values of formants at time t. When t = eL, than it describes the time nearest to the end of the left concatenated diphone, and t = bR describes the time nearest to the beginning of the concatenated right diphone; i.e. we always examine the difference of eL to bR features (being taken from a phone center). When the concatenated diphones neighbored in the corpus, then it is ensured that eL = bR.
Now it is possible to experiment with 3 computation schema:
- absolute difference of formants and their slopes:
cost = (abs(F1{eL} - F1{bR}) * W1 + abs(F2{eL} - F2{bR}) * W2 + ... + abs(F4{eL} - F4{bR}) * W4 + abs(S1{eL} - S1{bR}) * W1 + abs(S2{eL} - S2{bR}) * W2 + ... + abs(S4{eL} - S4{bR}) * W4 + F0-cost + energy-cost) / (W1 + W2 + W3 + W4 + 1 + 1)
where Sn is slope of the n-th format computed from sequence of [ Fn{t-4}, Fn{t-3}, Fn{t-2}, Fn{t-1}, Fn{t}, Fn{t+1}, Fn{t+2}, Fn{t+3}, Fn{t+4}] formant values, t = eL or t = bR.
- Euclidean distance of the formant contour:
cost = (euclid(C1{eL}, C1{bR}) * W1 + euclid(C2{eL}, C2{bR}) * W2 + ... + euclid(C4{eL}, C4{bR}) * W4 + F0-cost + energy-cost) / (W1 + W2 + W3 + W4 + 1 + 1)
where Cn{t} = [ Fn{t-4}, Fn{t-3}, Fn{t-2}, Fn{t-1}, Fn{t}, Fn{t+1}, Fn{t+2}, Fn{t+3}, Fn{t+4}] is the sequence of formant values
- Mean absolute difference of the formant contour, which is the same as the previous, but except the euclid(Cn{eL}, Cn{bR}) distance we use mean(abs(Cn{eL} - Cn{bR}))
For all the experiments, the weights were set to: W1 = 0.8, W2 = 1.0, W3 = 0.7, and W4 = 0.4. Also, there is no bandwidth considered now!
Formants¶
There are 2 versions of formant estimations: ESPS+PRAAT (which gives us 6 possible experiments)
What next¶
Now there are several questions to answer:
- how to design the experiment
- what text use to experiment
- which distance computation scheme to use
- which formants to use (I would vote for PRAAT)
Any thoughts?