Project management of NTIS P1 Cybernetic Systems and Department of Cybernetics | WiKKY

Project

General

Profile

Context penalty

This is part of RA3: Phonetically justified parameters for speech synthesis with parent task #3676.

For the further explanation, use the following notation of diphone and its context: [ l-context ] - [ l-phone ][ r-phone ] + [ r-context ],
i.e. the word ahoj will be decomposed into the diphones:

  1. [ $ ] - [ a ][ h ] + [ o ]
  2. [ a ] - [ h ][ o ] + [ j ]
  3. [ h ] - [ o ][ j ] + [ $ ]

Current context handling in ARTIC

The target cost is increased by:
  1. XX% when the required l-context (for the synthesized sequence of units) differs from the l-context of the examined unit
  2. XX% when the required r-context differs from the r-context of the examined unit

i.e., when an context matches exactly, the cost is not increased, otherwise the examined unit is slightly penalized without reflecting the type of mismatch.

Previous research (needs to be revised!)

Few years ago, we have experimented with [[hqsyn09-interni:Nazalita-labializace|labialization/nazalization]] context penalty. The main idea was to avoid concatenation of contexts, which are not compatible (i.e. may even affect the perceived identity of a phone, which may be case of #3880). All was summarized in paper On the Impact of Labialization Contexts on Unit Selection Speech Synthesis submitted to Interspeech 2010.

The algorithm was as follows, where want is the context required (given the sequence of diphones to be synthesized), and have is the actual context of the candidate examined:

  • when examining
    • diphone [ o,O,u,U,y,i,I,e,E,a,A ][ * ], disable the use of its l-context according to the following table
    • diphone [ * ][ o,O,u,U,y,i,I,e,E,a,A ], disable the use of its l-context according to the following table
      want / have m n,N J p,t,k b,d,g D f v s,S,z,Z x G h c,w,C,W j l r R,Q ! pause
      m x x x x x x x x x x x x x x x x
      n,N x x x x x x x x x x x x x x x x
      J x x x x x x x x x x x x x x x x
      p,t,k x x x
      b.d.g x x x
      D x x x
      f x x x
      v x x x
      s,S,z,Z x x x
      x x x x
      G x x x
      h x x x
      c,w,C,W x x x
      j x x x
      l x x x
      r x x x
      R,Q x x x
      ! x x x
      pause x x x
  • when examining:
    • diphone [ j,h,l,r,x,G,s,z ][ * ], disable the use of its l-context according to the following table
    • diphone [ * ][ j,h,l,r,x,G,s,z ], disable the use of its r-context according to the following table
      want / have o,O u,U y i,I e,E a,A
      o,O x x x
      u,U x x x
      y x x x
      i,I x x x
      e,E x x x
      a,A x x x
  • when examining:
    • diphone [ m,n,N,J,p,b,t,d,D,c,k,g,f,v,S,Z,c,C,w,W,R,Q,!,pause ][ * ], penalise the use for its l-context according to the following table
    • diphone [ * ][ m,n,N,J,p,b,t,d,D,c,k,g,f,v,S,Z,c,C,w,W,R,Q,!,pause ], penalise the use for its r-context according to the following table
      want / have o,O u,U y i,I e,E a,A
      o,O o o o
      u,U o o o
      y o o o
      i,I o o o
      e,E o o o
      a,A o o o

Note: there are missing [ M,T,Y,F,L,H,P ] phones!