Thomas Hain

Thomas Hain
Head of Group

Thomas Hain

Head of Group

Thomas holds the degree ‘Dipl.-Ing’ from the University of Technology, Vienna, and a PhD from Cambridge University. After work at Philips Speech Processing, Vienna he joined the Cambridge University Engineering Department in 1997, and moved to SpandH in 2004. He was promoted to Professor in 2012 He is leading the Machine Intelligence for Natural Interfaces subgroup. Thomas has published on machine learning and speech recognition topics in more than 100 publications in international conferences, journals and books (h-index 26). Apart from membership of many technical committees, including speech recognition area chair at ICASSP and Interspeech, he served on the IEEE Speech Technical Committee from 2007-2009, and as organizing committee member of Interspeech 2009, IEEE ASRU 2011 and 2013. He is currently member of the editorial board of Computer Speech and Language (CSL) and Associate Editor of the ACM Transactions on Speech and Language Processing. His recent research has its focus on recognition of natural speech in realistic environments, and on integration of speech technology with downstream processes such as content linking, summarisation or machine translation.

Contact

Contact Information

Office G041, Regent Court, 211 Portobello, Sheffield S1 4DP, UK

+44 114 222 1836

t.hain@sheffield.ac.uk

Publications

119 entries « 1 of 3 »

2022

Madina Hasan; Nicholas Jefferson; Thomas Hain; Jeremy Dawson: Automatic detection of behavioural codes in team interactions. In: Comput. Speech Lang., vol. 74, pp. 101339, 2022. (Type: Journal Article | Links)
Chanho Park; Rehan Ahmad; Thomas Hain: Unsupervised Data Selection for Speech Recognition with Contrastive Loss Ratios. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8587-8591, 2022. (Type: Inproceedings | Links)
Jose Antonio Lopez Saenz; Thomas Hain: A Model for Assessor Bias in Automatic Pronunciation Assessment. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7267-7271, 2022. (Type: Inproceedings | Links)
George Close; Thomas Hain; Stefan Goetze: MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data. In: 30th European Signal Processing Conference, EUSIPCO 2022, Belgrade, Serbia, August 29 - September 2, 2022, IEEE, 2022. (Type: Inproceedings | )

2021

Yanpei Shi; Qiang Huang; Thomas Hain: H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model. In: Neural Networks, vol. 142, pp. 329–339, 2021. (Type: Journal Article | Links)
Korbinian Friedl; Georgios Rizos; Lukas Stappen; Madina Hasan; Lucia Specia; Thomas Hain; Björn W. Schuller: Uncertainty Aware Review Hallucination for Science Article Classification. In: Zong, Chengqing; Xia, Fei; Li, Wenjie; Navigli, Roberto (Ed.): Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, pp. 5004–5009, Association for Computational Linguistics, 2021. (Type: Inproceedings | Links)
Jose Antonio Lopez Saenz; Md Asif Jalal; Rosanna Milner; Thomas Hain: Attention Based Model for Segmental Pronunciation Error Detection. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021, Cartagena, Colombia, December 13-17, 2021, pp. 725–732, IEEE, 2021. (Type: Inproceedings | Links)
Mingjie Chen; Yanpei Shi; Thomas Hain: Towards Low-Resource Stargan Voice Conversion Using Weight Adaptive Instance Normalization. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021, pp. 5949–5953, IEEE, 2021. (Type: Inproceedings | Links)
Qiang Huang; Thomas Hain: Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021, pp. 6473–6477, IEEE, 2021. (Type: Inproceedings | Links)
Shengjie Huang; Mingjie Chen; Yanyan Xu; Dengfeng Ke; Thomas Hain: WINVC: One-Shot Voice Conversion with Weight Adaptive Instance Normalization. In: Pham, Duc Nghia; Theeramunkong, Thanaruk; Governatori, Guido; Liu, Fenrong (Ed.): PRICAI 2021: Trends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8-12, 2021, Proceedings, Part II, pp. 559–573, Springer, 2021. (Type: Inproceedings | Links)
Jose Antonio Lopez Saenz; Thomas Hain: Use of Speaker Metadata for Improving Automatic Pronunciation Assessment. In: Anke, Luis Espinosa; Martín-Vide, Carlos; Spasic, Irena (Ed.): Statistical Language and Speech Processing - 9th International Conference, SLSP 2021, Cardiff, UK, November 23-25, 2021, Proceedings, pp. 61–72, Springer, 2021. (Type: Inproceedings | Links)
Yanpei Shi; Thomas Hain: Contextual Joint Factor Acoustic Embeddings. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021, pp. 750–757, IEEE, 2021. (Type: Inproceedings | Links)
Yanpei Shi; Thomas Hain: Supervised Speaker Embedding De-Mixing in Two-Speaker Environment. In: IEEE Spoken Language Technology Workshop, SLT 2021, Shenzhen, China, January 19-22, 2021, pp. 758–765, IEEE, 2021. (Type: Inproceedings | Links)
Anna Ollerenshaw; Md Asif Jalal; Thomas Hain: Insights on Neural Representations for End-to-End Speech Recognition. In: Hermansky, Hynek; Cernocký, Honza; Burget, Lukás; Lamel, Lori; Scharenborg, Odette; Motlícek, Petr (Ed.): Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, pp. 4079–4083, ISCA, 2021. (Type: Inproceedings | Links)

2020

Yanpei Shi; Qiang Huang; Thomas Hain: H-Vectors: Utterance-Level Speaker Embedding Using a Hierarchical Attention Model. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020, pp. 7579–7583, IEEE, 2020. (Type: Inproceedings | Links)
Yanpei Shi; Qiang Huang; Thomas Hain: Speaker Re-Identification with Speaker Dependent Speech Enhancement. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 1530–1534, ISCA, 2020. (Type: Inproceedings | Links)
Lukas Stappen; Georgios Rizos; Madina Hasan; Thomas Hain; Björn W. Schuller: Uncertainty-Aware Machine Support for Paper Reviewing on the Interspeech 2019 Submission Corpus. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 1808–1812, ISCA, 2020. (Type: Inproceedings | Links)
Yanpei Shi; Qiang Huang; Thomas Hain: Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 2992–2996, ISCA, 2020. (Type: Inproceedings | Links)
Md Asif Jalal; Rosanna Milner; Thomas Hain; Roger K. Moore: Removing Bias with Residual Mixture of Multi-View Attention for Speech Emotion Recognition. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4084–4088, ISCA, 2020. (Type: Inproceedings | Links)
Qiang Huang; Thomas Hain: Exploration of Audio Quality Assessment and Anomaly Localisation Using Attention Models. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4611–4615, ISCA, 2020. (Type: Inproceedings | Links)
Hardik B. Sailor; Thomas Hain: Multilingual Speech Recognition Using Language-Specific Phoneme Recognition as Auxiliary Task for Indian Languages. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4756–4760, ISCA, 2020. (Type: Inproceedings | Links)
Mingjie Chen; Thomas Hain: Unsupervised Acoustic Unit Representation Learning for Voice Conversion Using WaveNet Auto-Encoders. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4866–4870, ISCA, 2020. (Type: Inproceedings | Links)
Yanpei Shi; Qiang Huang; Thomas Hain: Robust Speaker Recognition Using Speech Enhancement And Attention Model. In: Lee, Kong-Aik; Koshinaka, Takafumi; Shinoda, Koichi (Ed.): Odyssey 2020: The Speaker and Language Recognition Workshop, 1-5 November 2020, Tokyo, Japan, pp. 451–458, ISCA, 2020. (Type: Inproceedings | Links)
Md Asif Jalal; Rosanna Milner; Thomas Hain: Empirical Interpretation of Speech Emotion Perception with Attention Based Model for Speech Emotion Recognition. In: Meng, Helen; Xu, Bo; Zheng, Thomas Fang (Ed.): Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, pp. 4113–4117, ISCA, 2020. (Type: Inproceedings | Links)

2019

Salil Deena; Madina Hasan; Mortaza Doulaty; Oscar Saz; Thomas Hain: Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment. In: IEEE ACM Trans. Audio Speech Lang. Process., vol. 27, no. 3, pp. 572–582, 2019. (Type: Journal Article | Links)
Rosanna Milner; Md Asif Jalal; Raymond W. M. Ng; Thomas Hain: A Cross-Corpus Study on Speech Emotion Recognition. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019, pp. 304–311, IEEE, 2019. (Type: Inproceedings | Links)
Md Asif Jalal; Roger K. Moore; Thomas Hain: Spatio-Temporal Context Modelling for Speech Emotion Classification. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019, pp. 853–859, IEEE, 2019. (Type: Inproceedings | Links)
Hardik B. Sailor; Salil Deena; Md Asif Jalal; Rasa Lileikyte; Thomas Hain: Unsupervised Adaptation of Acoustic Models for ASR Using Utterance-Level Embeddings from Squeeze and Excitation Networks. In: IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019, pp. 980–987, IEEE, 2019. (Type: Inproceedings | Links)
Qiang Huang; Thomas Hain: Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention. In: Kubin, Gernot; Kacic, Zdravko (Ed.): Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pp. 584–588, ISCA, 2019. (Type: Inproceedings | Links)
Md Asif Jalal; Erfan Loweimi; Roger K. Moore; Thomas Hain: Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In: Kubin, Gernot; Kacic, Zdravko (Ed.): Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pp. 1701–1705, ISCA, 2019. (Type: Inproceedings | Links)
Mortaza Doulaty; Thomas Hain: Latent Dirichlet Allocation Based Acoustic Data Selection for Automatic Speech Recognition. In: Kubin, Gernot; Kacic, Zdravko (Ed.): Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pp. 3228–3232, ISCA, 2019. (Type: Inproceedings | Links)

2018

Oscar Saz; Salil Deena; Mortaza Doulaty; Madina Hasan; Bilal Khaliq; Rosanna Milner; Raymond W. M. Ng; Julia Olcoz; Thomas Hain: Lightly supervised alignment of subtitles on multi-genre broadcasts. In: Multim. Tools Appl., vol. 77, no. 23, pp. 30533–30550, 2018. (Type: Journal Article | Links)
Erfan Loweimi; Jon Barker; Thomas Hain: Exploring the Use of Group Delay for Generalised VTS Based Noise Compensation. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018, pp. 4824–4828, IEEE, 2018. (Type: Inproceedings | Links)
Erfan Loweimi; Jon Barker; Thomas Hain: On the Usefulness of the Speech Phase Spectrum for Pitch Extraction. In: Yegnanarayana, B. (Ed.): Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, pp. 696–700, ISCA, 2018. (Type: Inproceedings | Links)
Mauro Nicolao; Michiel Sanders; Thomas Hain: Improved Acoustic Modelling for Automatic Literacy Assessment of Children. In: Yegnanarayana, B. (Ed.): Interspeech 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, 2-6 September 2018, pp. 1666–1670, ISCA, 2018. (Type: Inproceedings | Links)
Rahhal Errattahi; Salil Deena; Asmaa El Hannani; Hassan Ouahmane; Thomas Hain: Improving ASR Error Detection with RNNLM Adaptation. In: 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18-21, 2018, pp. 190–196, IEEE, 2018. (Type: Inproceedings | Links)

2017

Oscar Saz; Thomas Hain: Acoustic adaptation to dynamic background conditions with asynchronous transformations. In: Comput. Speech Lang., vol. 41, pp. 180–194, 2017. (Type: Journal Article | Links)
Raymond W. M. Ng; Mauro Nicolao; Thomas Hain: Unsupervised crosslingual adaptation of tokenisers for spoken language recognition. In: Comput. Speech Lang., vol. 46, pp. 327–342, 2017. (Type: Journal Article | Links)
Salil Deena; Raymond W. M. Ng; Pranava Swaroop Madhyastha; Lucia Specia; Thomas Hain: Exploring the use of acoustic embeddings in neural machine translation. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, pp. 450–457, IEEE, 2017. (Type: Inproceedings | Links)
Rosanna Milner; Thomas Hain: DNN approach to speaker diarisation using speaker channels. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp. 4925–4929, IEEE, 2017. (Type: Inproceedings | Links)
Erfan Loweimi; Jon Barker; Thomas Hain: Statistical normalisation of phase-based feature representation for robust speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp. 5310–5314, IEEE, 2017. (Type: Inproceedings | Links)
Erfan Loweimi; Jon Barker; Thomas Hain: Channel Compensation in the Generalised Vector Taylor Series Approach to Robust ASR. In: Lacerda, Francisco (Ed.): Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp. 2466–2470, ISCA, 2017. (Type: Inproceedings | Links)
Salil Deena; Raymond W. M. Ng; Pranava Swaroop Madhyastha; Lucia Specia; Thomas Hain: Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features. In: Lacerda, Francisco (Ed.): Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp. 2715–2719, ISCA, 2017. (Type: Inproceedings | Links)
Chenhao Wu; Raymond W. M. Ng; Oscar Saz; Thomas Hain: Analysing acoustic model changes for active learning in automatic speech recognition. In: International Conference on Systems, Signals and Image Processing, IWSSIP 2017, Poznań, Poland, May 22-24, 2017, pp. 1–5, IEEE, 2017. (Type: Inproceedings | Links)
Erfan Loweimi; Jon Barker; Oscar Saz; Thomas Hain: Robust Source-Filter Separation of Speech Signal in the Phase Domain. In: Lacerda, Francisco (Ed.): Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, pp. 414–418, ISCA, 2017. (Type: Inproceedings | Links)
Raymond W. M. Ng; Alvin C. M. Kwan; Tan Lee; Thomas Hain: Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, pp. 5825–5829, IEEE, 2017. (Type: Inproceedings | Links)

2016

Raymond W. M. Ng; Kashif Shah; Lucia Specia; Thomas Hain: Groupwise learning for ASR k-best list reranking in spoken langauge translation. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. (Type: Inproceedings | )
Rosanna Milner; Thomas Hain: Segment-oriented evaluation of speaker diarisation performance. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp. 5460–5464, IEEE, 2016. (Type: Inproceedings | Links)
Raymond W. M. Ng; Kashif Shah; Lucia Specia; Thomas Hain: Groupwise learning for ASR k-best list reranking in spoken language translation. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pp. 6120–6124, IEEE, 2016. (Type: Inproceedings | Links)
Sarah Al-Shareef; Thomas Hain: Colloquialising Modern Standard Arabic Text for Improved Speech Recognition. In: Morgan, Nelson (Ed.): Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8-12, 2016, pp. 1345–1349, ISCA, 2016. (Type: Inproceedings | Links)
119 entries « 1 of 3 »

Back to Top