Thomas Hain

Head of Group

Thomas Hain

Head of Group

Thomas holds the degree ‘Dipl.-Ing’ from the University of Technology, Vienna, and a PhD from Cambridge University. After work at Philips Speech Processing, Vienna he joined the Cambridge University Engineering Department in 1997, and moved to SpandH in 2004. He was promoted to Professor in 2012 He is leading the Machine Intelligence for Natural Interfaces subgroup. Thomas has published on machine learning and speech recognition topics in more than 100 publications in international conferences, journals and books (h-index 26). Apart from membership of many technical committees, including speech recognition area chair at ICASSP and Interspeech, he served on the IEEE Speech Technical Committee from 2007-2009, and as organizing committee member of Interspeech 2009, IEEE ASRU 2011 and 2013. He is currently member of the editorial board of Computer Speech and Language (CSL) and Associate Editor of the ACM Transactions on Speech and Language Processing. His recent research has its focus on recognition of natural speech in realistic environments, and on integration of speech technology with downstream processes such as content linking, summarisation or machine translation.

Contact

Contact Information

Where to find Us?

Office G041, Regent Court, 211 Portobello, Sheffield S1 4DP, UK

+44 114 222 1836

t.hain@sheffield.ac.uk

Publications

145 entries « ‹ 1 of 3 › »

2024

Rhiannon Mogridge; George Close; Robert Sutherland; Thomas Hain; Jon Barker; Stefan Goetze; Anton Ragni: Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users Using Intermediate ASR Features and Human Memory Models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Forthcoming. (Type: Proceedings Article | )

George Close; William Ravenscroft; Thomas Hain; Stefan Goetze: MULTI-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Forthcoming. (Type: Proceedings Article | )

William Ravenscroft; Stefan Goetze; Thomas Hain: Combining Conformer and Dual-Path-Transformer Networks for Single Channel Noisy Reverberant Speech Separation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Forthcoming. (Type: Proceedings Article | )

Rehan Ahmad; Muhammad Umar Farooq; Thomas Hain: Progressive Unsupervised Domain Adaptation for ASR Using Ensemble Models and Multi-stage Training. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Forthcoming. (Type: Proceedings Article | )

Amit Meghanani; Thomas Hain: SCORE: Self-supervised Correspondence Fine-Tuning for Improved Content Representations. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Forthcoming. (Type: Proceedings Article | )

Amit Meghanani; Thomas Hain : Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), Forthcoming. (Type: Proceedings Article | )

2023

Will Ravenscroft; Stefan Goetze; Thomas Hain: On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2023. (Type: Proceedings Article | Links)

Elaf Islam; Thomas Hain; Protima Nomo Sudro: Simulation of Teacher-Learner Interaction in English Language Pronunciation Learning. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2023. (Type: Proceedings Article | Links)

Amit Meghanani; Thomas Hain: Deriving Translational Acoustic Sub-Word Embeddings. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2023. (Type: Proceedings Article | Links)

Muhammad Umar Farooq; Rehan Ahmad; Thomas Hain: MUST: A Multilingual Student-Teacher Approach for Low-Resource Speech Recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2023. (Type: Proceedings Article | Links)

George Close; Thomas Hain; Stefan Goetze: The Effect of Spoken Language on Speech Enhancement Using Self-Supervised Speech Representation Loss Functions. In: 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023. (Type: Proceedings Article | Links)

William Ravenscroft; Stefan Goetze; Thomas Hain: On Data Sampling Strategies for Training Neural Network Speech Separation Models. In: 31st European Signal Processing Conference (EUSIPCO), 2023. (Type: Proceedings Article | Links)

Protima Nomo Sudro; Anton Ragni; Thomas Hain: Adapting pretrained models for adult to child voice conversion. In: 1st European Signal Processing Conference (EUSIPCO), 2023. (Type: Proceedings Article | Links)

Muhammad Umar Farooq; Thomas Hain: Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition. In: Interspeech 2023, 2023. (Type: Proceedings Article | Links)

George Close; William Ravenscroft; Thomas Hain; Stefan Goetze: The University of Sheffield CHiME-7 UDASE Challenge Speech Enhancement System. 2023. (Type: Technical Report | Links)

Elaf Islam; Chanho Park; Thomas Hain: Exploring Speech Representations for Proficiency Assessment in Language Learning. In: 9th Workshop on Speech and Language Technology in Education (SLaTE), 2023. (Type: Proceedings Article | Links)

William Ravenscroft; Stefan Goetze; Thomas Hain: Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023, IEEE, 2023. (Type: Proceedings Article | Links)

George Close; William Ravenscroft; Thomas Hain; Stefan Goetze: Perceive and Predict: Self-Supervised Speech Representation Based Loss Functions for Speech Enhancement. In: EEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023, IEEE, 2023. (Type: Proceedings Article | Links)

George Close; Thomas Hain; Stefan Goetze: PAMGAN+/-: Improving phase-aware speech enhancement performance via expanded discriminator training. In: 154th Convention of Audio Engineering Society (AES), Europe 2023, 2023. (Type: Proceedings Article | Links)

Rehan Ahmad; Md Asif Jalal; Muhammad Umar Farooq; Anna Ollerenshaw; Thomas Hain: Towards Domain Generalisation in ASR with Elitist Sampling and Ensemble Knowledge Distillation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023, IEEE, 2023. (Type: Proceedings Article | Links)

2022

William Ravenscroft; Stefan Goetze; Thomas Hain: Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation. In: 2020 17th International Workshop on Acoustic Signal Enhancement (IWAENC), 2022. (Type: Proceedings Article | Links)

Madina Hasan; Nicholas Jefferson; Thomas Hain; Jeremy Dawson: Automatic detection of behavioural codes in team interactions. In: Comput. Speech Lang., vol. 74, pp. 101339, 2022. (Type: Journal Article | Links)

Chanho Park; Rehan Ahmad; Thomas Hain: Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8587-8591, 2022. (Type: Proceedings Article | Links)

Jose Antonio Lopez Saenz; Thomas Hain: A Model for Assessor Bias in Automatic Pronunciation Assessment. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7267-7271, 2022. (Type: Proceedings Article | Links)

George Close; Thomas Hain; Stefan Goetze: MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data. In: 30th European Signal Processing Conference, EUSIPCO 2022, Belgrade, Serbia, August 29 - September 2, 2022, IEEE, 2022. (Type: Proceedings Article | Links)

Muhammad Umar Farooq; Darshan Adiga Haniya Narayana; Thomas Hain : Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion. In: Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, South Korea, September 18 - 22, 2022, ISCA, 2022. (Type: Proceedings Article | Links)

Muhammad Umar Farooq; Thomas Hain : Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition. In: Interspeech 2022, 23rd Annual Conference of the International Speech Communication Association, Incheon, South Korea, September 18 - 22, 2022, ISCA, 2022. (Type: Proceedings Article | Links)

William Ravenscroft; Stefan Goetze; Thomas Hain: Att-TasNet: Attending to Encodings in Time-Domain Audio Speech Separation of Noisy, Reverberant Speech Mixtures. In: 2022. (Type: Journal Article | Links)

William Ravenscroft; Stefan Goetze; Thomas Hain: Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation. In: IEEE 30th European Signal Processing Conference, EUSIPCO 2022, Belgrade, Serbia, August 29 - September 2, 2022, 2022. (Type: Proceedings Article | Links)

Thomas Hain; Md Asif Jalal; Anna Ollerenshaw: Insights of Neural Representations in Multi-Banded and Multi-Channel Convolutional Transformers for End-to-End ASR. In: IEEE 30th European Signal Processing Conference, EUSIPCO 2022, Belgrade, Serbia, August 29 - September 2, 2022, 2022. (Type: Proceedings Article | )

2021

Yanpei Shi; Qiang Huang; Thomas Hain: H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model. In: Neural Networks, vol. 142, pp. 329–339, 2021. (Type: Journal Article | Links)

Korbinian Friedl; Georgios Rizos; Lukas Stappen; Madina Hasan; Lucia Specia; Thomas Hain; Björn W. Schuller: Uncertainty Aware Review Hallucination for Science Article Classification. In: Zong, Chengqing; Xia, Fei; Li, Wenjie; Navigli, Roberto (Ed.): Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, pp. 5004–5009, Association for Computational Linguistics, 2021. (Type: Proceedings Article | Links)

Jose Antonio Lopez Saenz; Md Asif Jalal; Rosanna Milner; Thomas Hain: Attention Based Model for Segmental Pronunciation Error Detection. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, Cartagena, Colombia, December 13-17, 2021, pp. 725–732, IEEE, 2021. (Type: Proceedings Article | Links)

Mingjie Chen; Yanpei Shi; Thomas Hain: Towards Low-Resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021, pp. 5949–5953, IEEE, 2021. (Type: Proceedings Article | Links)

Qiang Huang; Thomas Hain: Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021, pp. 6473–6477, IEEE, 2021. (Type: Proceedings Article | Links)

Shengjie Huang; Mingjie Chen; Yanyan Xu; Dengfeng Ke; Thomas Hain: WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization. In: Pham, Duc Nghia; Theeramunkong, Thanaruk; Governatori, Guido; Liu, Fenrong (Ed.): PRICAI 2021: Trends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8-12, 2021, Proceedings, Part II, pp. 559–573, Springer, 2021. (Type: Proceedings Article | Links)

Jose Antonio Lopez Saenz; Thomas Hain: Use of Speaker Metadata for Improving Automatic Pronunciation Assessment. In: Anke, Luis Espinosa; Martín-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing - 9th International Conference, SLSP 2021, Cardiff, UK, November 23-25, 2021, Proceedings, pp. 61–72, Springer, 2021. (Type: Proceedings Article | Links)

Yanpei Shi; Thomas Hain: Contextual Joint Factor Acoustic Embeddings. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021, pp. 750–757, IEEE, 2021. (Type: Proceedings Article | Links)

Yanpei Shi; Thomas Hain: Supervised Speaker Embedding De-Mixing in Two-Speaker Environment. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021, pp. 758–765, IEEE, 2021. (Type: Proceedings Article | Links)

Anna Ollerenshaw; Md Asif Jalal; Thomas Hain: Insights on Neural Representations for End-to-End Speech Recognition. In: Hermansky, Hynek; Cernocký, Honza; Burget, Lukás; Lamel, Lori; Scharenborg, Odette; Motlícek, Petr (Ed.): Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, pp. 4079–4083, ISCA, 2021. (Type: Proceedings Article | Links)

2020

Yanpei Shi; Qiang Huang; Thomas Hain: H-Vectors: Utterance-Level Speaker Embedding Using a Hierarchical Attention Model. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020, pp. 7579–7583, IEEE, 2020. (Type: Proceedings Article | Links)

Yanpei Shi; Qiang Huang; Thomas Hain: Speaker Re-Identification with Speaker Dependent Speech Enhancement. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 1530–1534, ISCA, 2020. (Type: Proceedings Article | Links)

Lukas Stappen; Georgios Rizos; Madina Hasan; Thomas Hain; Björn W. Schuller: Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 1808–1812, ISCA, 2020. (Type: Proceedings Article | Links)

Yanpei Shi; Qiang Huang; Thomas Hain: Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 2992–2996, ISCA, 2020. (Type: Proceedings Article | Links)

Md Asif Jalal; Rosanna Milner; Thomas Hain; Roger K. Moore: Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4084–4088, ISCA, 2020. (Type: Proceedings Article | Links)

Qiang Huang; Thomas Hain: Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4611–4615, ISCA, 2020. (Type: Proceedings Article | Links)

Hardik B. Sailor; Thomas Hain: Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4756–4760, ISCA, 2020. (Type: Proceedings Article | Links)

Mingjie Chen; Thomas Hain: Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4866–4870, ISCA, 2020. (Type: Proceedings Article | Links)

Yanpei Shi; Qiang Huang; Thomas Hain: Robust Speaker Recognition Using Speech Enhancement And Attention Model. In: Lee, Kong-Aik; Koshinaka, Takafumi; Shinoda, Koichi (Ed.): Odyssey 2020: The Speaker and Language Recognition Workshop, 1-5 November 2020, Tokyo, Japan, pp. 451–458, ISCA, 2020. (Type: Proceedings Article | Links)

Md Asif Jalal; Rosanna Milner; Thomas Hain: Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4113–4117, ISCA, 2020. (Type: Proceedings Article | Links)

145 entries « ‹ 1 of 3 › »