Next: LESAN: Lexical and syntactic
Up: Some projects of the
Previous: ISRL: Improving speech recognition
The task of a polyglot text-to-speech (TTS) synthesis system is to
transform mixed-lingual text into appropriate speech. First steps
towards such a system have been made with the projects
POSSY/TTS'99 (diphone library for polyglot
TTS) and LESAN (lexical and syntactic
analysis of mixed-lingual sentences).
In this project further steps towards a complete polyglot TTS
synthesis system will be made, namely:
- Our monolingual TTS system SVOX can easily be configured for a
new language by simply replacing all its databases (lexica,
grammars, rule sets for accentuation and phrasing, neural network
for fundamental frequency control, diphone library, etc.) by those
of the new language. In contrast to this, a polyglot TTS system must
hold the databases of a certain set of languages simultaneously and
apply them appropriately (see paper [PR03]). Thus our
monolingual TTS system needed a major redesign. The new system will
be able to handle the language mixing phenomena in all processing
steps, and therefore is called polySVOX (for further
information, please see paper [RP06], article
[RP07], or thesis [Rom09b]. Some
audio examples can be found at the polySVOX demo site.
- One of the most difficult problems of polyglot TTS synthesis is
the generation of adequate prosody. Investigations in project
LESAN have shown, that foreign inclusions
are phonetically and prosodically assimilated to the base language.
But the degree of assimilation of the embedded language to the base
language, the depends strongly on size of the inclusion, and in
particular sharply contrasts between the language regions in
Switzerland: In the French speaking part the assimilation is much
stronger than in the German speaking part. In other words: A
polyglot TTS system that can be used in the German speaking part of
Switzerland has to distinguish very strongly between the
pronunciation of German sentences (base language) and the
pronunciation of the foreign inclusions.
These very general rules are far from being sufficient for the
prosody generation in a polyglot TTS system. Therefore, appropriate
investigations will be made in this project, in order to get a more
complete knowledge of this issue. Furthermore, we will try to use
statistical models (particularly neural networks) for prosody
control (see [RPB05]). This approach has shown to be
very successful in the monolingual case. Although there are still
many open questions, we consider this approach very promising.
Supported by: This project was partly supported by the
NCCR IM2 (i.e. by the
Swiss National Science Foundation).
Next: LESAN: Lexical and syntactic
Up: Some projects of the
Previous: ISRL: Improving speech recognition
Last updated: Mon Nov 20 15:00:45 CET 2017
by: Beat Pfister
!!! Dieses Dokument stammt aus dem
ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the
ETH Web archive and is no longer maintained !!!