Publications

Up: Welcome to the speech Previous: RECO: Speaker-independent word recognizer

Publications

GH16: J.-P. Goldman and P.-E. Honnet, et al. The SIWIS database: a multilingual speech database with acted emphasis. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (175KB)
TGPV16: N. Takahashi, M. Gygli, B. Pfister, and L. Van Gool. Deep convolutional neural networks and data augmentation for acoustic event recognition. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (363KB)
TNP16: N. Takahashi, T. Naghibi, and B. Pfister. Automatic pronunciation generation by utilizing a semi-supervised deep neural networks. In Proceedings of the Interspeech, San Francisco (USA), 2016. PDF (359KB)
Nag15: T. Naghibi. Towards Robust Audio-Visual Speech Recognition. PhD thesis, No. 22867, Computer Engineering and Networks Laboratory, ETH Zurich, 2015. PDF (1885KB)
NHP15: T. Naghibi, S. Hoffmann, and B. Pfister. A semidefinite programming based search strategy for feature selection with mutual information measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8):1529-1541, 2015. PDF (451KB)
GCG14: P. Garner, R. Clark, and J.-P. Goldman, et al. Translation and prosody in swiss languages. In Proceedings of 3rd Swiss Workshop on Prosody: Nouveaux cahiers de linguistique française, pages 211-221, 2014. PDF (370KB)
Hof14: S. Hoffmann. A Data-driven Model for the Generation of Prosody from Syntactic Sentence Structures. PhD thesis, No. 21991, Computer Engineering and Networks Laboratory, ETH Zurich, 2014. PDF (999KB)
LH14: H. Liang and S. Hoffmann. Capturing Speaker-Independent Prosodic Information by Syntax Tree-Based Prosody Modelling. Internal Report of the SNSF project SIWIS. TIK, ETH Zurich, June 2014.
Lia14: H. Liang. Analysis of duration: Are emphases achieved differently in different languages in terms of duration?, November 2014.
Nag14: T. Naghibi. Modality Weighting for Audio-Visual Fusion in Speech Recognition. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2014.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_14_report.pdf). PDF (237KB)
NP14: T. Naghibi and B. Pfister. A boosting framework on grounds of online learning. In Proceedings of NIPS, Montr?al (Canada), December 2014.
HP13: S. Hoffmann and B. Pfister. Text-to-speech alignment of long recordings using universal phone models. In Proceedings of Interspeech, pages 1520-1524, Lyon (France), September 2013. PDF (124KB)
Nag13: T. Naghibi. Robust Feature Extraction for Bimodal Speech Recognizer. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2013.
(http://www.tik.ee.ethz.ch/spr/publications/Naghibi_13_report.pdf). PDF (120KB)
NHP13a: T. Naghibi, S. Hoffmann, and B. Pfister. Convex approximation of the NP-hard search problem in feature subset selection. In Proceedings of ICASSP, pages 3273-3277, Vancouver (Canada), May 2013. PDF (167KB)
NHP13b: T. Naghibi, S. Hoffmann, and B. Pfister. An efficient method to estimate pronunciation from multiple utterances. In Proceedings of Interspeech, pages 1951-1955, Lyon (France), September 2013. PDF (170KB)
Ewe12: T. Ewender. Automatic Selection of Speech Segments for Concatenative Speech Synthesis. PhD thesis, No. 20828, Computer Engineering and Networks Laboratory, ETH Zurich, 2012. PDF (2431KB)
HP12: S. Hoffmann and B. Pfister. Employing sentence structure: Syntax trees as prosody generators. In Proceedings of Interspeech, Portland, Oregon (USA), September 2012. PDF (984KB)
KP12: T. Kaufmann and B. Pfister. Syntactic language modeling with formal grammars. Speech Communication (Elsevier), 54(6):715-731, July 2012. PDF (382KB)
Nag12: T. Naghibi. Multi-Channel Audio Processing for Human Machine Interaction Applications. Annual Report of the SNSF project no. 200021 130224/1. Speech group, TIK, ETH Zurich, April 2012.
( http://www.tik.ee.ethz.ch/spr/publications/Naghibi_12_report.pdf). PDF (127KB)
NP12a: T. Naghibi and B. Pfister. An approach to prevent adaptive beamformers from cancelling the desired signal. In Proceedings of ICASSP, pages 205-208, Kyoto (Japan), March 2012. IEEE. PDF (194KB)
NP12b: T. Naghibi and B. Pfister. Beamformer design for nonstationary signals by means of interfrequency correlations. In Proceedings of SAM, pages 261-264, Hoboken, NJ (USA), June 2012. PDF (487KB)
EP11: T. Ewender and B. Pfister. Automatically creating a diphone set from a speech database. In Proceedings of Interspeech, pages 2169-2172, Florence (Italy), August 2011. PDF (159KB)
Ger11: M. Gerber. Speech Recognition Techniques for Languages with Limited Linguistic Resources. PhD thesis, No. 19507, Computer Engineering and Networks Laboratory, ETH Zurich, 2011. PDF (1264KB)
GKP11: M. Gerber, T. Kaufmann, and B. Pfister. Extended Viterbi algorithm for optimized word HMMs. In Proceedings of ICASSP, pages 4932-4935, Prague (Czech Republic), May 2011. PDF (220KB)
Nag11: T. Naghibi. VSHMI Experimentation System. Annual Report of the SNSF project no. 200021 130224/1. TIK, ETH Zurich, March 2011. PDF (267KB)
EP10: T. Ewender and B. Pfister. Accurate pitch marking for prosodic modification of speech segments. In Proceedings of Interspeech, pages 178-181, Makuhari (Japan), September 2010. PDF (291KB)
Hof10: S. Hoffmann. Preliminary Study of Prosody in Foreign Language Inclusions. Report for ETH project no. TH-22 07-2. Speech Processing Group, TIK, ETH Zurich, June 2010. PDF (13688KB)
HP10: S. Hoffmann and B. Pfister. Fully automatic segmentation for prosodic speech corpora. In Proceedings of Interspeech, pages 1389-1392, Makuhari (Japan), September 2010. PDF (204KB)
KP10: T. Kaufmann and B. Pfister. Semi-automatic extension of morphological lexica. In Workshop Computational Linguistics - Applications, Wisla (Poland), 2010. PDF (117KB)
PN10: B. Pfister and T. Naghibi. Concept of the VSHMI Experimentation System. Report of the SNSF project no. 200021 130224/1. TIK, ETH Zurich, June 2010. PDF
EHP09: T. Ewender, S. Hoffmann, and B. Pfister. Nearly perfect detection of continuous F0 contour and frame classification for TTS synthesis. In Proceedings of Interspeech, pages 100-103, Brighton (United Kingdom), September 2009. demo examples PDF (771KB)
Hof09: S. Hoffmann. Automatic Phone Segmentation. Progress report of project no. TH-22 07-2. Speech Processing Group, TIK, ETH Zurich, September 2009. PDF (5979KB)
Kau09: T. Kaufmann. A Rule-based Language Model for Speech Recognition. PhD thesis, No. 18700, Computer Engineering and Networks Laboratory, ETH Zurich, 2009. PDF (897KB)
KEP09: T. Kaufmann, T. Ewender, and B. Pfister. Improving broadcast news transcription with a precision grammar and discriminative reranking. In Proceedings of Interspeech, pages 356-359, Brighton (United Kingdom), September 2009. PDF (264KB)
Rom09a: H. Romsdorfer. Polyglot speech prosody control. In Proceedings of Interspeech, pages 488-491, Brighton (United Kingdom), September 2009. PDF (482KB)
Rom09b: H. Romsdorfer. Polyglot Text-to-Speech Synthesis: Text Analysis & Prosody Control. PhD thesis, No. 18210, ETH Zurich. Shaker Verlag Aachen (ISBN 978-3-8322-8090-1), February 2009. PDF (1223KB)
Rom09c: H. Romsdorfer. Weighted neural network ensemble models for speech prosody control. In Proceedings of Interspeech, pages 492-495, Brighton (United Kingdom), September 2009. PDF (606KB)
GP08: M. Gerber and B. Pfister. Fast search for common segments in speech signals for speaker verification. In Proceedings of Interspeech, pages 375-378, Brisbane (Australia), September 2008. PDF (204KB)
KP08: T. Kaufmann and B. Pfister. Applying a grammar-based language model to a simplified broadcast-news transcription task. In Proceedings of ACL, pages 106-113, Columbus (Ohio), June 2008. PDF (464KB)
PK08: B. Pfister und T. Kaufmann. Sprachverarbeitung: Grundlagen und Methoden der Sprachsynthese und Spracherkennung. Springer Verlag (ISBN: 978-3-540-75909-6), 2008.
Beu07: R. Beutler. Improving Speech Recognition through Linguistic Knowledge. PhD thesis, No. 17039, Computer Engineering and Networks Laboratory, ETH Zurich, January 2007. PDF (2135KB)
GBP07: M. Gerber, R. Beutler, and B. Pfister. Quasi text-independent speaker verification based on pattern matching. In Proceedings of Interspeech, pages 1993-1996, Antwerp, August 2007. PDF (658KB)
GKP07: M. Gerber, T. Kaufmann, and B. Pfister. Perceptron-based class verification. In Proceedings of NOLISP (ISCA Workshop on non linear speech processing), Paris, May 2007. PDF (170KB)
KP07: T. Kaufmann and B. Pfister. Applying licenser rules to a grammar with continuous constituents. In Stefan Müller, editor, Proceedings of the 14th International Conference on Head-Driven Phrase Structure Grammar, pages 150-162, Stanford, 2007. CSLI Publications. PDF (73KB)
RP07: H. Romsdorfer and B. Pfister. Text analysis and language identification for polyglot text-to-speech synthesis. Speech Communication (Elsevier), 49(9):697-724, September 2007. PDF (563KB)
RP06: H. Romsdorfer and B. Pfister. Character stream parsing of mixed-lingual text. In ISCA Tutorial and Research Workshop on Multilingual Speech and Language Processing (MultiLing 2006), Stellenbosch (South Africa), April 2006. PDF (122KB)
BKP05a: R. Beutler, T. Kaufmann, and B. Pfister. Integrating a non-probabilistic grammar into large vocabulary continuous speech recognition. In Proceedings of the IEEE ASRU 2005 Workshop, pages 104-109, San Juan (Puerto Rico), November 2005. PDF (124KB)
BKP05b: R. Beutler, T. Kaufmann, and B. Pfister. Using rule-based knowledge to improve LVCSR. In Proceedings of ICASSP, pages 829-832, Philadelphia (USA), March 2005. PDF (204KB)
GP05: M. Gerber and B. Pfister. Quasi text-independent speaker verification with neural networks. MLMI'05 Workshop, Edinburgh (United Kingdom), July 2005. PDF (337KB)
Kau05: T. Kaufmann. Evaluation von Grammatikformalismen in Hinblick auf die Anwendung in der Spracherkennung . Zwischenbericht zum Nationalfonds-Projekt 105211-104078/1: Rule-Based Language Model for Speech Recognition. Institut TIK, ETH Zürich, September 2005.
RP05: H. Romsdorfer and B. Pfister. Phonetic labeling and segmentation of mixed-lingual prosody databases. In Proceedings of Interspeech, pages 3281-3284, Lisbon (Portugal), September 2005. PDF (224KB)
RPB05: H. Romsdorfer, B. Pfister, and R. Beutler. A mixed-lingual phonological component which drives the statistical prosody control of a polyglot TTS synthesis system. In S. Bengio and H. Bourlard, editors, Machine Learning for Multimodal Interaction, pages 263-276. Springer-Verlag Heidelberg, January 2005. PDF (237KB)
Beu04: R. Beutler. Open vocabulary CSR by linguistic knowledge. COST 278 workshop, Mons (Belgium), January 2004.
NP04: U. Niesen and B. Pfister. Speaker verification by means of ANNs. In Proceedings of ESANN, Bruges (Belgium), pages 145-150, April 2004. PDF (63KB)
RP04: H. Romsdorfer and B. Pfister. Multi-context rules for phonological processing in polyglot TTS synthesis. In Proceedings of Interspeech, pages 737-740, Jeju Island (Korea), October 2004. PDF (115KB)
Beu03: R. Beutler. Improve continuous speech recognition thru linguistic knowledge. COST 278 workshop, Barcelona, February 2003.
BP03: R. Beutler and B. Pfister. Integrating statistical and rule-based knowledge for continuous German speech recognition. In Proceedings of Eurospeech, pages 937-940, Geneva, September 2003. PDF (174KB)
PB03: B. Pfister and R. Beutler. Estimating the weight of evidence in forensic speaker verification. In Proceedings of Eurospeech, pages 701-704, Geneva, September 2003. PDF (88KB)
PR03: B. Pfister and H. Romsdorfer. Mixed-lingual text analysis for polyglot TTS synthesis. In Proceedings of Eurospeech, pages 2037-2040, Geneva, September 2003. PDF (52KB)
Beu02: R. Beutler. Recognition of continuously spoken German language using linguistic knowledge. COST 278 workshop, Eindhoven, August 2002.
Leh02: G. Lehtinen. Sprecheradaptation und Out-of-Vocabulary-Modell. Bericht zum Projekt: Einsatz von Spracherkennung in der SAPH. Institut TIK, ETH Zürich, April 2002.
PW02: B. Pfister, E. Wehrli et al. Lexical and Syntactic Analysis of Mixed-Lingual Sentences for Text-to-Speech. Final Report of SNSF Project No 21-59396.99. Institut TIK, ETH Zürich, November 2002.
Pfi01: B. Pfister. Personenidentifizierung anhand der Stimme. Kriminalistik, 55. Jahrgang, Heft 4, S. 287-292 (Fachzeitschrift des Hüthig Verlags, Heidelberg), April 2001. PDF (338KB)
PL01: B. Pfister und G. Lehtinen. Schlussbericht für das Projekt COST249: Erkennung kontinuierlicher Sprache über das Telefon. Institut TIK, ETH Zürich, Januar 2001. PostScript (210KB)
TJ01: C. Traber and V. Jantzen. The SVOX TTS System. COST258 workshop, Prague, May 2001.
Jan00: V. Jantzen. Neural network-based pitch control for various sentence types. COST258 workshop, Stockholm, April 2000. PDF (176KB)
JWLL00: F.T. Johansen, N. Warakagoda, B. Lindberg, G. Lehtinen, et al. The COST249 SpeechDat multilingual reference recogniser. In Proceedings of LREC'2000 (Conference on Language, Resources and Evaluation), Athens (Greece), June 2000. PostScript (119KB)
LJWL00: B. Lindberg, F.T. Johansen, N. Warakagoda, G. Lehtinen, et al. A noise robust multilingual reference recogniser based on SpeechDat(II). In Proceedings of ICSLP, Beijing (China), October 2000. PostScript (60KB)
LS00: G. Lehtinen, S. Safra, et al. IDAS: Interactive Directory Assistance Services. In Proceedings of the COST249 ISCA Workshop on Voice Operated Telecom Services, pages 51-54, Gent (Belgium), May 2000. PostScript (128KB)
Tra00a: C. Traber. Das Sprachsynthesesystem SVOX. 11. Konferenz Elektronische Sprachsignalverarbeitung (ESSV 2000), Cottbus, September 2000.
Tra00b: C. Traber. Spectral smoothing of diphone boundary mismatches. COST258 workshop, Stockholm, April 2000.
TH99: C. Traber, K. Huber, et al. From multilingual to polyglot speech synthesis. In Proceedings of Eurospeech, pages 835-838, Budapest, September 1999. PDF
HPT98: K. Huber, B. Pfister und Ch. Traber. POSSY: Ein Projekt zur Realisierung einer polyglotten Sprachsynthese. In DAGA-Tagungsband, S. 392-393, 1998. PostScript (33KB)
Hub98a: K. Huber. Swiss German Polyphone - Schlussbericht. TIK-Report Nr.48. Institut TIK, ETH Zürich, Juni 1998.
Hub98b: K. Huber. Zusammenstellung der Trägerwörter für Deutsch und Italienisch. Bericht Nr.1 zum Projekt TTS'97. Institut TIK, ETH Zürich, Juni 1998.
Leh98: G. Lehtinen. Einsatz des konfigurierbaren Worterkenners WOROV. Bericht Nr.2. zum Projekt: Reverse Directory Service. Institut TIK, ETH Zürich, Januar 1998.
LS98a: G. Lehtinen and S. Safra. Generation and selection of pronunciation variants for a flexible word recognizer. In Proceedings of the ESCA Workshop: Modeling Pronunciation Variation for ASR, pages 67-71, Rolduc (The Netherlands), May 1998. PostScript (98KB)
LS98b: G. Lehtinen und S. Safra. Generierung von Aussprachevariantenregeln und Verbesserung von Subwortmodellen für einen flexiblen Worterkenner. In DAGA-Tagungsband, S. 400-401, March 1998. PostScript (124KB)
PH98: B. Pfister, K. Huber et al. Das Sprachsynthesesystem SVOX und seine praktische Anwendbarkeit. In DAGA-Tagungsband, S. 338-339, 1998. PostScript (107KB)
Rie98: M. Riedi. Controlling Segmental Duration in Speech Synthesis Systems. PhD thesis, No. 12487, Computer Engineering and Networks Laboratory, ETH Zurich (TIK-Schriftenreihe Nr. 26, ISBN 3-906469-05-0), February 1998. PostScript (3168KB)
Saf98: S. Safra. A Parsing Strategy in ARCOS-G. Talk at the COST249 meeting in Porto, Portugal, February 12-13, 1998. (printed in Final Report of COST249). PDF (52KB)
SLH98: S. Safra, G. Lehtinen, and K. Huber. Modeling pronunciation variations and coarticulation with finite-state transducers in CSR. In Proceedings of the ESCA Workshop: Modeling Pronunciation Variation for ASR, pages 125-130, Rolduc (The Netherlands), May 1998. PostScript (197KB)
LP97: G. Lehtinen und B. Pfister et al. Reverse Directory Service. Projektbericht Nr.1, Institut TIK, ETH Zürich, September 1997.
Rie97: M. Riedi. Modeling segmental duration with multivariate adaptive regression splines. In Proceedings of Eurospeech, pages 2627-2630, Rhodes (Greece), September 1997. PostScript (162KB)
Saf97: S. Safra. Das Experimentalsystem ARCOS: Konzepte, Aufbau, Methoden. Zwischenbericht zum Projekt ARCOS-G. Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Juni 1997.
Tra97: C. Traber. Improvements of the Morpho-Syntactic Analysis of the SVOX Text-to-Speech System. Projektbericht, Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Mai 1997.
Hut96: H.-P. Hutter. Comparison of Classic and Hybrid HMM Approaches to Speech Recognition over Telephone Lines. PhD thesis, No. 11662, Computer Engineering and Networks Laboratory, ETH Zurich (TIK-Schriftenreihe Nr. 15, ISBN 3 7281 2424 9), October 1996.
Pfi96a: B. Pfister. High-quality prosodic modification of speech signals. In Proceedings of ICSLP, pages 2446-2449, Philadelphia, October 1996. demo examples PDF (822KB)
Pfi96b: B. Pfister. Prosodische Modifikation von Sprachsegmenten für die konkatenative Sprachsynthese. Diss. Nr. 11331, TIK-Schriftenreihe Nr. 11 (ISBN 3 7281 2316 1), ETH Zürich, März 1996. PostScript (2987KB)
Saf96: S. Safra. Chartparsing in Continuous Speech Recognition. Talk at the COST249 meeting in Kosice, Slovakia, February 29, 1996. (printed in Final Report of COST249). PDF (99KB)
Hut95: H.-P. Hutter. Comparison of a new hybrid connectionist-SCHMM approach with other hybrid approaches for speech recognition. In Proceedings of ICASSP. IEEE, 1995. PDF (427KB)
LP95: G. Lehtinen und B. Pfister. Portierung des ARA-Systems auf die SparcStation-Plattform von Sun Microsystems. Bericht Nr.3 zum Projekt Realisation einer automatischen Rufnummernauskunft. Institut TIK, ETH Zürich, Oktober 1995.
Pfi95: B. Pfister. The SVOX Text-to-Speech System. Laboratory TIK, ETH Zurich, September 1995. PDF (109KB)
Rie95: M. Riedi. A neural network-based model of segmental duration for speech synthesis. In Proceedings of Eurospeech, pages 599-602, Madrid (Spain), September 1995. PDF (365KB)
Saf95: S. Safra. Handling Pronunciation Variants and Co-articulation with Finite State Transducers. Talk at the COST249 meeting in Nancy, France (printed in Final Report of COST249), March 6/7, 1995. PDF (20KB)
Tra95: C. Traber. SVOX: The Implementation of a Text-to-Speech System for German. PhD thesis, No. 11064, Computer Engineering and Networks Laboratory, ETH Zurich, TIK-Schriftenreihe Nr. 7 (ISBN 3 7281 2239 4), March 1995. PDF (927KB) PostScript (2271KB)
HP94: H.-P. Hutter und B. Pfister. Neuartiger hybrider SKHMM/KNN-Ansatz für die Spracherkennung. In Studientexte zur Sprachkommunikation, Heft 11, S. 90-97. TU Berlin, Oktober 1994.
Hut94: H.-P. Hutter. Recognizer for isolated German digits over telephone lines: RECO. In Final Report of COST232, 1994.
PLC94: B. Pfister, G. Lehtinen und D. Christnach. ARA-V1: Systembeschreibung und Auswertung eines Testeinsatzes. Bericht Nr.2 zum Projekt Realisation einer automatischen Rufnummernauskunft. Institut für Elektronik, ETH Zürich, September 1994.
PS94: B. Pfister und A. Schaub. Automatische Rufnummern-Auskunft. Technische Mitteilungen Telecom PTT, Mai 1994.
Saf94: S. Safra. Experimentalsystem zur Erkennung kontinuierlicher Sprache. Erster Bericht zum Projekt ARCOS-G. Institut für Technische Informatik und Kommunikationsnetze, ETH Zürich, Februar 1994.
SP94: S. Safra und B. Pfister. ARCOS-G: Ein Experimentalsystem zur Erkennung kontinuierlicher deutscher Sprache. In Studientexte zur Sprachkommunikation, Heft 11, S. 174-181. TU Berlin, Oktober 1994. PostScript (613KB)
Tra93: C. Traber. Syntactic processing and prosody control in the SVOX TTS system for German. In Proceedings of Eurospeech, pages 2099-2102, September 1993.
Hub91: K. Huber. Messung und Modellierung der Segmentdauer für die Synthese deutscher Lautsprache. Diss. Nr. 9535, Institut für Elektronik, ETH Zürich, Juli 1991.
Hub90: K. Huber. A statistical model of duration control for speech synthesis. In Proc. of the EUSIPCO, Barcelona, September 1990.
Rus90: T. Russi. A Framework for Syntactic and Morphological Analysis and its Application in a Text-to-Speech System. PhD thesis, No. 9328, Electronics Laboratory, ETH Zurich, December 1990.

Last updated: Mon Nov 20 15:00:45 CET 2017 by: Beat Pfister

!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!