Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

Task #3688: Separation of some phonemes into distinct phones

We have prepared the data in #3763. In the spkr_KI.vystupy_z_TTS.zip file (voice Iva - spkr_KI) are the synthesized prompts with grouped phones split, in spkr_KI.vystupy_z_TTS_JSON.zip, the details about the selected units are stored in JSON format (see phrases.candselection.candidates.#.cand item for details)

The zip files contain data named:
  • r with prompts from #3747
  • l with prompts from #3748
  • n with prompts from #3749
  • rr with prompts from #3750
  • ch_G with prompts from #3751, [ch] replaced by [G]
  • ch_h with prompts from #3751, [ch] replaced by [h]
  • ch_x with prompts from #3751, [ch] replaced by [x]

For example, for [r+P] grouping (phrases from #3747, stored in waves/r subdirectory), there is [P] in place where [r] should correctly be and vice versa. All the other units are the same as they would be in case of "classic" synthesis (that with phones grouped).

There is one note to add:
more names in JSON for a unit mean that this particular unit was used in for all the diphones listed. This is due to the fact thaw we must handle all cases (in case of random text synthesis) without crash. It may sometimes be misleading, since for example:

"unitName": ["Pa", "ra"]
appears when [Pa] unit should be used. However, since there is no [Pa] unit in the corpus (cannot be due to nonsense), [ra] unit is always used. Therefore, since we are using diphones, there is no guarantee that all the units were synthesized as required. Given the previous example, even when [P] is required to be used, [pP] + [ra] diphones were actually concatenated in "kapr to na_pr_al do zdi" since there is no [Pa] unit.

Feel free to ask in case of questions.

----
Data for #3922 are placed to spkr_AJ.vystupy_z_TTS.zip file (voice Jan - spkr_AJ) and spkr_AJ.vystupy_z_TTS_JSON.zip. The format is the same as described.

----
Data for #4058 are placed to spkr_JS.vystupy_z_TTS.zip file (voice Stanislav - spkr_JS). The format is the same as described.
Data for #4059 are placed to spkr_SK.vystupy_z_TTS.zip file (voice Katka - spkr_SK). The format is the same as described.