Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

Actions

Task #3844

closed

Task #3669: RA1a - Analysis and cataloguing of artifacts

Task #3688: Separation of some phonemes into distinct phones

Analyze utterances with phonetically incorrect phoneme

Added by Tihelka Dan about 8 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Start date:
06.06.2016
Due date:
20.11.2016
% Done:

100%

Estimated time:

Description

We have added data synthesized in #3763 to wiki. Please look at them if it is what you have requested.

Continue with analysis with the aim to answer the questions:
  • must the phone variants be treated separately? I.e. must [P] speech units be used only where phonetic transcriptions says it belongs to, or can they be used in [r] positions under certain circumstances?
  • what are those circumstances? For example, is exchanging possible in some contexts?

Files

allophonic variants.docx (17.8 KB) allophonic variants.docx details regarding individual items (in Czech) Skarnitzl Radek, 20.05.2016 11:29
allophonic variants2.docx (18.6 KB) allophonic variants2.docx Skarnitzl Radek, 21.06.2016 13:32
allophonic variantsJan.docx (17.7 KB) allophonic variantsJan.docx Skarnitzl Radek, 07.07.2016 13:30
allophonic variantsKateřina.docx (19 KB) allophonic variantsKateřina.docx Skarnitzl Radek, 05.12.2016 23:08
allophonic variantsStanislav.docx (18.4 KB) allophonic variantsStanislav.docx Skarnitzl Radek, 05.12.2016 23:08

Related issues

Follows HQSYN16 - Task #3763: Synthesize utterances with phonetically incorrect phoneme realizationsClosedMatura Martin25.02.201603.06.2016

Actions
Actions #1

Updated by Tihelka Dan about 8 years ago

  • Follows Task #3763: Synthesize utterances with phonetically incorrect phoneme realizations added
Actions #2

Updated by Skarnitzl Radek about 8 years ago

  • Due date changed from 09.03.2016 to 02.05.2016
  • Status changed from New to Assigned
Actions #3

Updated by Skarnitzl Radek almost 8 years ago

Allophonic variants

We performed a detailed auditory analysis of the synthesized phrases. Details are attached as a separate file, allophonic variants.docx, in Czech.

The results can be briefly summarized as follows:

1) [r] vs. [r=]
A normal [r] is very often realized as a flap (i.e., not a trill) in Czech. When such an [r] is inserted into words which should have [r=], quality is impaired - the target sound is very short and the absence of trilling is disturbing.
Sometimes there is an impression of an [r+vowel] sequence, rather than [r=].

2) [l] vs. [l=]
When a syllabic [l=] is used in contexts where a [l] would be followed by a vowel (i.e., potlach [potlax]), a glottal stop is usually inserted [potl=?ax], which is extremely disturbing.
Sometimes there is an impression of an [l+vowel] sequence, rather than [l=].

3) [n] vs. [N]
When alveolar [n] is used in pre-velar contexts, it is usually accompanied with an epenthetic schwa [len@ka].
When velar [N] is used in non-velar contexts, it is often audible and disturbing.

4) [x] vs. [G]
Here the phonetic voicing of the segment often did not correspond to the definition; none of the sequences was disturbing.

5) voiced ř [\P] vs. voiceless ř [\Q]
(Partially) voiceless realizations in contexts which should have a voiced realization is usually not disturbing, as ř is frequently (partially) devoiced in ordinary speech.
However, when the voicing of ř also changes the voicing of a neighbouring obstruent, the result is disturbing and may even lead to change of meaning (dřímá --> třímá).
When a voiced [\P] is used in contexts which should be voiceless, the result is disturbing.

Actions #4

Updated by Matoušek Jindřich almost 8 years ago

The auditory analysis will be performed for the utterances that were not synthesized (this was especially the case of [x] vs. [G] ). So keeping this task unclosed, waiting until the utterances are re-synthesized (#3763).

A similar auditory analysis will be repeated for the voice Jan - see #3922.

Actions #5

Updated by Tihelka Dan almost 8 years ago

  • Status changed from Resolved to Assigned

I have uploaded to wiki new files with synthesized utterances - now all should be correctly synthesized. Except the fixes of missing items and more variants of [ch] replacements, the remaining files contain the same data as were in the previous version (thus they does not have to be analysed again).

Actions #6

Updated by Skarnitzl Radek almost 8 years ago

The second round of auditory analyses did not reveal any new information. The conclusions and recommendations from the previous phase hold:

1) the two versions of /l/ - plain [l] and syllabic [L] - should be kept separately, as the l -> L interchange in prevocalic contexts almost always leads to synthesizing a glottal stop (e.g., [vL!ak]).

2) the alveolar [n] and velar [N] should be kept separately, the N->n interchange usually yields an epenthetic schwa (e.g., [len@ka]).

3) the two versions of ř - voiced P\ and voiceless Q\ - should be kept separately

4) the allophones of /x/ - [x G h] - can be pooled together, without any marked intrusive effects.

Actions #7

Updated by Tihelka Dan almost 8 years ago

  • Status changed from Resolved to Assigned

The same as before, but now for male voice (Jan, spkr_AJ) - data put as attachment on wiki.

Actions #8

Updated by Skarnitzl Radek almost 8 years ago

Auditory analysis of the male speaker yielded more favourable results than for the female. Details are attached as a separate file, allophonic variantsJan.docx, in Czech, the main findings are summarized here:

1) based on the listening sample, the two versions of /r/ - plain [r] and syllabic [r=] - could be pooled

2) the two versions of /l/ - plain [l] and syllabic [l=] - could be pooled because the l -> l= interchange in prevocalic contexts never led to synthesizing a glottal stop (it almost always did with the female speaker).

3) the alveolar [n] and velar [N] could also be pooled, the N->n interchange led to an epenthetic schwa (e.g., [len@ka]) only once, and the effect was not very disturbing.

4) the two versions of ř - voiced P\ and voiceless Q\ - should be kept separately

5) the allophones of /x/ - [x G h] - can be pooled together, without any marked intrusive effects

In conclusion, it seems advantageous to specify the pooling or keeping apart of allophonic variants for every speaker in the database separately.

Actions #9

Updated by Matoušek Jindřich over 7 years ago

  • Status changed from Resolved to Closed
Actions #10

Updated by Matoušek Jindřich over 7 years ago

  • Due date changed from 17.08.2016 to 20.11.2016
  • Status changed from Closed to Assigned

The same as before, but now for male voice (Stanislav, spkr_JS) and for female voice (Kateřina, spkr_SK) - data put as attachment on wiki.

Actions #11

Updated by Skarnitzl Radek over 7 years ago

The results of the last set of analyses mostly follow the previous findings, and they are provided in the attached documents. Here the task is summarized and conclusions are drawn:

1) Regarding the syllabic [r=] and [l=], it does not seem to be possible to generalize with respect to the speaker's sex. In the female voice, Katerina, the presence of syllabic [r=] led several times to the insertion of the glottal stop, e.g. drozd as [dr=!ost]; this never happened with the male speaker, Stanislav. However, when [l=] was synthesized, a glottal stop was inserted in most of the cases in the male speaker's voice, Stanislav, and never with the female voice.
Conclusion: preferring to err on the side of caution, syllabic and non-syllabic sonorants - [l ] and [l=], as well as [r] and [r=] - should be kept separately.

2) The interchange in synthesis of the allophones of /n/ - [n] and [N] - sometimes leads to the insertion of an epenthetic schwa (e.g., [len@ka]); the occurrence, as well as the salience of the schwa differs in individual speakers. Assuming that a nasal coming originally from the velar context would hardly be synthesized into an alveolar context in natural conditions (e.g., víno as [vi:No]), the answer to the question whether to keep the two contexts separately is not quite straightforward and depends on the richness of the segment inventory.
Conclusion: the epenthetic schwa is sometimes rather intrusive, but if the "cost" of keeping [n] and [N] as separate segments were to be too high, it is possibly to pool them.

3) The results are unequivocal with regard to the Czech fricative trill /ř/ and its too allophones - the voiced [P\] and voiceless [Q\]. Interchanging one for the other typically leads to very intrusive effects.
Conclusion: the two allophones of /ř/ must be kept separately.

4) The effect of interchanging the allophones of the velar fricative /x/ seems to be relatively minor, especially when it comes to [x] and its voiced counterpart [G]. The laryngeal [h] sometimes is intrusive, especially when it also triggers incorrect voicing assimilation; under normal conditions, this would most probably not happen, though.
Conclusion: the allophones of /x/ can be pooled together.

Actions #12

Updated by Matoušek Jindřich over 7 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF