J.-P. Goldman and P.-E. Honnet, et al.
The SIWIS database: a multilingual speech database with acted
emphasis.
In Proceedings of the Interspeech, San Francisco (USA), 2016.
PDF (175KB)
N. Takahashi, M. Gygli, B. Pfister, and L. Van Gool.
Deep convolutional neural networks and data augmentation for acoustic
event recognition.
In Proceedings of the Interspeech, San Francisco (USA), 2016.
PDF (363KB)
N. Takahashi, T. Naghibi, and B. Pfister.
Automatic pronunciation generation by utilizing a semi-supervised
deep neural networks.
In Proceedings of the Interspeech, San Francisco (USA), 2016.
PDF (359KB)
T. Naghibi.
Towards Robust Audio-Visual Speech Recognition.
PhD thesis, No. 22867, Computer Engineering and Networks Laboratory,
ETH Zurich, 2015.
PDF (1885KB)
T. Naghibi, S. Hoffmann, and B. Pfister.
A semidefinite programming based search strategy for feature
selection with mutual information measure.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
37(8):1529-1541, 2015.
PDF (451KB)
P. Garner, R. Clark, and J.-P. Goldman, et al.
Translation and prosody in swiss languages.
In Proceedings of 3rd Swiss Workshop on Prosody: Nouveaux
cahiers de linguistique française, pages 211-221, 2014.
PDF (370KB)
S. Hoffmann.
A Data-driven Model for the Generation of Prosody from Syntactic
Sentence Structures.
PhD thesis, No. 21991, Computer Engineering and Networks Laboratory,
ETH Zurich, 2014.
PDF (999KB)
H. Liang and S. Hoffmann.
Capturing Speaker-Independent Prosodic Information by Syntax
Tree-Based Prosody Modelling.
Internal Report of the SNSF project SIWIS. TIK, ETH Zurich, June
2014.
T. Naghibi.
Modality Weighting for Audio-Visual Fusion in Speech
Recognition.
Annual Report of the SNSF project no. 200021 130224/1. Speech group,
TIK, ETH Zurich, April 2014.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_14_report.pdf).
PDF (237KB)
S. Hoffmann and B. Pfister.
Text-to-speech alignment of long recordings using universal phone
models.
In Proceedings of Interspeech, pages 1520-1524, Lyon (France),
September 2013.
PDF (124KB)
T. Naghibi.
Robust Feature Extraction for Bimodal Speech Recognizer.
Annual Report of the SNSF project no. 200021 130224/1. Speech group,
TIK, ETH Zurich, April 2013.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_13_report.pdf).
PDF (120KB)
T. Naghibi, S. Hoffmann, and B. Pfister.
Convex approximation of the NP-hard search problem in feature
subset selection.
In Proceedings of ICASSP, pages 3273-3277, Vancouver (Canada),
May 2013.
PDF (167KB)
T. Naghibi, S. Hoffmann, and B. Pfister.
An efficient method to estimate pronunciation from multiple
utterances.
In Proceedings of Interspeech, pages 1951-1955, Lyon (France),
September 2013.
PDF (170KB)
T. Ewender.
Automatic Selection of Speech Segments for Concatenative Speech
Synthesis.
PhD thesis, No. 20828, Computer Engineering and Networks Laboratory,
ETH Zurich, 2012.
PDF (2431KB)
S. Hoffmann and B. Pfister.
Employing sentence structure: Syntax trees as prosody generators.
In Proceedings of Interspeech, Portland, Oregon (USA),
September 2012.
PDF (984KB)
T. Naghibi.
Multi-Channel Audio Processing for Human Machine Interaction
Applications.
Annual Report of the SNSF project no. 200021 130224/1. Speech group,
TIK, ETH Zurich, April 2012.
(
http://www.tik.ee.ethz.ch/spr/publications/Naghibi_12_report.pdf).
PDF (127KB)
T. Naghibi and B. Pfister.
An approach to prevent adaptive beamformers from cancelling the
desired signal.
In Proceedings of ICASSP, pages 205-208, Kyoto (Japan), March
2012. IEEE.
PDF (194KB)
T. Naghibi and B. Pfister.
Beamformer design for nonstationary signals by means of
interfrequency correlations.
In Proceedings of SAM, pages 261-264, Hoboken, NJ (USA), June
2012.
PDF (487KB)
T. Ewender and B. Pfister.
Automatically creating a diphone set from a speech database.
In Proceedings of Interspeech, pages 2169-2172, Florence
(Italy), August 2011.
PDF (159KB)
M. Gerber.
Speech Recognition Techniques for Languages with Limited
Linguistic Resources.
PhD thesis, No. 19507, Computer Engineering and Networks Laboratory,
ETH Zurich, 2011.
PDF (1264KB)
M. Gerber, T. Kaufmann, and B. Pfister.
Extended Viterbi algorithm for optimized word HMMs.
In Proceedings of ICASSP, pages 4932-4935, Prague (Czech
Republic), May 2011.
PDF (220KB)
T. Ewender and B. Pfister.
Accurate pitch marking for prosodic modification of speech segments.
In Proceedings of Interspeech, pages 178-181, Makuhari
(Japan), September 2010.
PDF (291KB)
S. Hoffmann.
Preliminary Study of Prosody in Foreign Language Inclusions.
Report for ETH project no. TH-22 07-2. Speech Processing Group, TIK,
ETH Zurich, June 2010.
PDF (13688KB)
S. Hoffmann and B. Pfister.
Fully automatic segmentation for prosodic speech corpora.
In Proceedings of Interspeech, pages 1389-1392, Makuhari
(Japan), September 2010.
PDF (204KB)
T. Kaufmann and B. Pfister.
Semi-automatic extension of morphological lexica.
In Workshop Computational Linguistics - Applications, Wisla
(Poland), 2010.
PDF (117KB)
B. Pfister and T. Naghibi.
Concept of the VSHMI Experimentation System.
Report of the SNSF project no. 200021 130224/1. TIK, ETH Zurich, June
2010.
PDF
T. Ewender, S. Hoffmann, and B. Pfister.
Nearly perfect detection of continuous F0 contour and frame
classification for TTS synthesis.
In Proceedings of Interspeech, pages 100-103, Brighton (United
Kingdom), September 2009.
demo examplesPDF (771KB)
S. Hoffmann.
Automatic Phone Segmentation.
Progress report of project no. TH-22 07-2. Speech Processing Group,
TIK, ETH Zurich, September 2009.
PDF (5979KB)
T. Kaufmann.
A Rule-based Language Model for Speech Recognition.
PhD thesis, No. 18700, Computer Engineering and Networks Laboratory,
ETH Zurich, 2009.
PDF (897KB)
T. Kaufmann, T. Ewender, and B. Pfister.
Improving broadcast news transcription with a precision grammar and
discriminative reranking.
In Proceedings of Interspeech, pages 356-359, Brighton (United
Kingdom), September 2009.
PDF (264KB)
H. Romsdorfer.
Polyglot Text-to-Speech Synthesis: Text Analysis & Prosody
Control.
PhD thesis, No. 18210, ETH Zurich. Shaker Verlag Aachen (ISBN
978-3-8322-8090-1), February 2009.
PDF (1223KB)
H. Romsdorfer.
Weighted neural network ensemble models for speech prosody control.
In Proceedings of Interspeech, pages 492-495, Brighton (United
Kingdom), September 2009.
PDF (606KB)
M. Gerber and B. Pfister.
Fast search for common segments in speech signals for speaker
verification.
In Proceedings of Interspeech, pages 375-378, Brisbane
(Australia), September 2008.
PDF (204KB)
T. Kaufmann and B. Pfister.
Applying a grammar-based language model to a simplified
broadcast-news transcription task.
In Proceedings of ACL, pages 106-113, Columbus (Ohio), June
2008.
PDF (464KB)
B. Pfister und T. Kaufmann.
Sprachverarbeitung: Grundlagen und Methoden der Sprachsynthese
und Spracherkennung.
Springer Verlag (ISBN: 978-3-540-75909-6), 2008.
R. Beutler.
Improving Speech Recognition through Linguistic Knowledge.
PhD thesis, No. 17039, Computer Engineering and Networks Laboratory,
ETH Zurich, January 2007.
PDF (2135KB)
M. Gerber, R. Beutler, and B. Pfister.
Quasi text-independent speaker verification based on pattern
matching.
In Proceedings of Interspeech, pages 1993-1996, Antwerp,
August 2007.
PDF (658KB)
M. Gerber, T. Kaufmann, and B. Pfister.
Perceptron-based class verification.
In Proceedings of NOLISP (ISCA Workshop on non linear speech
processing), Paris, May 2007.
PDF (170KB)
T. Kaufmann and B. Pfister.
Applying licenser rules to a grammar with continuous constituents.
In Stefan Müller, editor, Proceedings of the 14th
International Conference on Head-Driven Phrase Structure Grammar, pages
150-162, Stanford, 2007. CSLI Publications.
PDF (73KB)
H. Romsdorfer and B. Pfister.
Text analysis and language identification for polyglot text-to-speech
synthesis.
Speech Communication (Elsevier), 49(9):697-724, September
2007.
PDF (563KB)
H. Romsdorfer and B. Pfister.
Character stream parsing of mixed-lingual text.
In ISCA Tutorial and Research Workshop on Multilingual Speech
and Language Processing (MultiLing 2006), Stellenbosch (South Africa), April
2006.
PDF (122KB)
R. Beutler, T. Kaufmann, and B. Pfister.
Integrating a non-probabilistic grammar into large vocabulary
continuous speech recognition.
In Proceedings of the IEEE ASRU 2005 Workshop, pages 104-109,
San Juan (Puerto Rico), November 2005.
PDF (124KB)
R. Beutler, T. Kaufmann, and B. Pfister.
Using rule-based knowledge to improve LVCSR.
In Proceedings of ICASSP, pages 829-832, Philadelphia (USA),
March 2005.
PDF (204KB)
M. Gerber and B. Pfister.
Quasi text-independent speaker verification with neural networks.
MLMI'05 Workshop, Edinburgh (United Kingdom), July 2005.
PDF (337KB)
T. Kaufmann.
Evaluation von Grammatikformalismen in Hinblick auf die
Anwendung in der Spracherkennung .
Zwischenbericht zum Nationalfonds-Projekt 105211-104078/1: Rule-Based
Language Model for Speech Recognition. Institut TIK, ETH Zürich, September
2005.
H. Romsdorfer and B. Pfister.
Phonetic labeling and segmentation of mixed-lingual prosody
databases.
In Proceedings of Interspeech, pages 3281-3284, Lisbon
(Portugal), September 2005.
PDF (224KB)
H. Romsdorfer, B. Pfister, and R. Beutler.
A mixed-lingual phonological component which drives the statistical
prosody control of a polyglot TTS synthesis system.
In S. Bengio and H. Bourlard, editors, Machine Learning for
Multimodal Interaction, pages 263-276. Springer-Verlag Heidelberg, January
2005.
PDF (237KB)
H. Romsdorfer and B. Pfister.
Multi-context rules for phonological processing in polyglot TTS
synthesis.
In Proceedings of Interspeech, pages 737-740, Jeju Island
(Korea), October 2004.
PDF (115KB)
R. Beutler and B. Pfister.
Integrating statistical and rule-based knowledge for continuous
German speech recognition.
In Proceedings of Eurospeech, pages 937-940, Geneva, September
2003.
PDF (174KB)
B. Pfister and R. Beutler.
Estimating the weight of evidence in forensic speaker verification.
In Proceedings of Eurospeech, pages 701-704, Geneva, September
2003.
PDF (88KB)
B. Pfister and H. Romsdorfer.
Mixed-lingual text analysis for polyglot TTS synthesis.
In Proceedings of Eurospeech, pages 2037-2040, Geneva,
September 2003.
PDF (52KB)
G. Lehtinen.
Sprecheradaptation und Out-of-Vocabulary-Modell.
Bericht zum Projekt: Einsatz von Spracherkennung in der SAPH.
Institut TIK, ETH Zürich, April 2002.
B. Pfister, E. Wehrli et al.
Lexical and Syntactic Analysis of Mixed-Lingual Sentences for
Text-to-Speech.
Final Report of SNSF Project No 21-59396.99. Institut TIK, ETH
Zürich, November 2002.
B. Pfister.
Personenidentifizierung anhand der Stimme.
Kriminalistik, 55. Jahrgang, Heft 4, S. 287-292
(Fachzeitschrift des Hüthig Verlags, Heidelberg), April 2001.
PDF (338KB)
B. Pfister und G. Lehtinen.
Schlussbericht für das Projekt COST249: Erkennung
kontinuierlicher Sprache über das Telefon.
Institut TIK, ETH Zürich, Januar 2001.
PostScript (210KB)
F.T. Johansen, N. Warakagoda, B. Lindberg, G. Lehtinen, et
al.
The COST249 SpeechDat multilingual reference recogniser.
In Proceedings of LREC'2000 (Conference on Language, Resources
and Evaluation), Athens (Greece), June 2000.
PostScript (119KB)
B. Lindberg, F.T. Johansen, N. Warakagoda, G. Lehtinen, et
al.
A noise robust multilingual reference recogniser based on
SpeechDat(II).
In Proceedings of ICSLP, Beijing (China), October 2000.
PostScript (60KB)
G. Lehtinen, S. Safra, et al.
IDAS: Interactive Directory Assistance Services.
In Proceedings of the COST249 ISCA Workshop on Voice Operated
Telecom Services, pages 51-54, Gent (Belgium), May 2000.
PostScript (128KB)
K. Huber, B. Pfister und Ch. Traber.
POSSY: Ein Projekt zur Realisierung einer polyglotten
Sprachsynthese.
In DAGA-Tagungsband, S. 392-393, 1998.
PostScript (33KB)
G. Lehtinen.
Einsatz des konfigurierbaren Worterkenners WOROV.
Bericht Nr.2. zum Projekt: Reverse Directory Service. Institut TIK,
ETH Zürich, Januar 1998.
G. Lehtinen and S. Safra.
Generation and selection of pronunciation variants for a flexible
word recognizer.
In Proceedings of the ESCA Workshop: Modeling Pronunciation
Variation for ASR, pages 67-71, Rolduc (The Netherlands), May 1998.
PostScript (98KB)
G. Lehtinen und S. Safra.
Generierung von Aussprachevariantenregeln und Verbesserung von
Subwortmodellen für einen flexiblen Worterkenner.
In DAGA-Tagungsband, S. 400-401, March 1998.
PostScript (124KB)
B. Pfister, K. Huber et al.
Das Sprachsynthesesystem SVOX und seine praktische Anwendbarkeit.
In DAGA-Tagungsband, S. 338-339, 1998.
PostScript (107KB)
M. Riedi.
Controlling Segmental Duration in Speech Synthesis Systems.
PhD thesis, No. 12487, Computer Engineering and Networks Laboratory,
ETH Zurich (TIK-Schriftenreihe Nr. 26, ISBN 3-906469-05-0), February
1998.
PostScript (3168KB)
S. Safra.
A Parsing Strategy in ARCOS-G.
Talk at the COST249 meeting in Porto, Portugal, February 12-13,
1998.
(printed in Final Report of COST249).
PDF (52KB)
S. Safra, G. Lehtinen, and K. Huber.
Modeling pronunciation variations and coarticulation with
finite-state transducers in CSR.
In Proceedings of the ESCA Workshop: Modeling Pronunciation
Variation for ASR, pages 125-130, Rolduc (The Netherlands), May 1998.
PostScript (197KB)
M. Riedi.
Modeling segmental duration with multivariate adaptive regression
splines.
In Proceedings of Eurospeech, pages 2627-2630, Rhodes
(Greece), September 1997.
PostScript (162KB)
S. Safra.
Das Experimentalsystem ARCOS: Konzepte, Aufbau,
Methoden.
Zwischenbericht zum Projekt ARCOS-G. Institut für Technische
Informatik und Kommunikationsnetze, ETH Zürich, Juni 1997.
C. Traber.
Improvements of the Morpho-Syntactic Analysis of the SVOX
Text-to-Speech System.
Projektbericht, Institut für Technische Informatik und
Kommunikationsnetze, ETH Zürich, Mai 1997.
H.-P. Hutter.
Comparison of Classic and Hybrid HMM Approaches to Speech
Recognition over Telephone Lines.
PhD thesis, No. 11662, Computer Engineering and Networks Laboratory,
ETH Zurich (TIK-Schriftenreihe Nr. 15, ISBN 3 7281 2424 9), October
1996.
B. Pfister.
High-quality prosodic modification of speech signals.
In Proceedings of ICSLP, pages 2446-2449, Philadelphia,
October 1996.
demo examplesPDF (822KB)
B. Pfister.
Prosodische Modifikation von Sprachsegmenten für die
konkatenative Sprachsynthese.
Diss. Nr. 11331, TIK-Schriftenreihe Nr. 11 (ISBN 3 7281 2316 1),
ETH Zürich, März 1996.
PostScript (2987KB)
S. Safra.
Chartparsing in Continuous Speech Recognition.
Talk at the COST249 meeting in Kosice, Slovakia, February 29,
1996.
(printed in Final Report of COST249).
PDF (99KB)
H.-P. Hutter.
Comparison of a new hybrid connectionist-SCHMM approach with other
hybrid approaches for speech recognition.
In Proceedings of ICASSP. IEEE, 1995.
PDF (427KB)
G. Lehtinen und B. Pfister.
Portierung des ARA-Systems auf die
SparcStation-Plattform von Sun Microsystems.
Bericht Nr.3 zum Projekt Realisation einer automatischen
Rufnummernauskunft. Institut TIK, ETH Zürich, Oktober 1995.
M. Riedi.
A neural network-based model of segmental duration for speech
synthesis.
In Proceedings of Eurospeech, pages 599-602, Madrid (Spain),
September 1995.
PDF (365KB)
S. Safra.
Handling Pronunciation Variants and Co-articulation with Finite
State Transducers.
Talk at the COST249 meeting in Nancy, France (printed in Final
Report of COST249), March 6/7, 1995.
PDF (20KB)
C. Traber.
SVOX: The Implementation of a Text-to-Speech System for German.
PhD thesis, No. 11064, Computer Engineering and Networks Laboratory,
ETH Zurich, TIK-Schriftenreihe Nr. 7 (ISBN 3 7281 2239 4), March 1995.
PDF (927KB)PostScript (2271KB)
H.-P. Hutter und B. Pfister.
Neuartiger hybrider SKHMM/KNN-Ansatz für die Spracherkennung.
In Studientexte zur Sprachkommunikation, Heft 11, S. 90-97. TU
Berlin, Oktober 1994.
B. Pfister, G. Lehtinen und D. Christnach.
ARA-V1: Systembeschreibung und Auswertung eines Testeinsatzes.
Bericht Nr.2 zum Projekt Realisation einer automatischen
Rufnummernauskunft. Institut für Elektronik, ETH Zürich, September 1994.
S. Safra.
Experimentalsystem zur Erkennung kontinuierlicher
Sprache.
Erster Bericht zum Projekt ARCOS-G. Institut für Technische
Informatik und Kommunikationsnetze, ETH Zürich, Februar 1994.
S. Safra und B. Pfister.
ARCOS-G: Ein Experimentalsystem zur Erkennung kontinuierlicher
deutscher Sprache.
In Studientexte zur Sprachkommunikation, Heft 11, S. 174-181.
TU Berlin, Oktober 1994.
PostScript (613KB)
K. Huber.
Messung und Modellierung der Segmentdauer für die Synthese
deutscher Lautsprache.
Diss. Nr. 9535, Institut für Elektronik, ETH Zürich, Juli 1991.
T. Russi.
A Framework for Syntactic and Morphological Analysis and its
Application in a Text-to-Speech System.
PhD thesis, No. 9328, Electronics Laboratory, ETH Zurich, December
1990.
Last updated: Mon Nov 20 15:00:45 CET 2017
by: Beat Pfister
!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!