The homeService Corpus
The homeService Corpus
The homeService corpus is a new English speech database which has been gathered as part of the homeService project. The homeService project is the impact showcase for the UK EPSRC Programme Grant Project, Natural Speech Technology (NST) a collaboration between the Universities of Edinburgh, Cambridge and Sheffield and it is concerned with how speech technology can be of use for people with speech disorders and restricted upper-limb mobility.
The audio recorded during such interactions consists of realistic speech data of speakers with severe dysarthria. The audio recorded during such interactions consists of realistic data of speakers with severe dysarthria. The majority of the homeService corpus is recorded in real home environments where voice control is often the normal means by which users interact with their devices.
The homeService corpus v1.1
The homeService corpus v1.1 is the second release of the audio recorded within the homeService project and it consists of audio recordings of dysarthric speech from 5 different subjects (three male, two female).
Speaker | Type of data | Vocabulary | Number of interactions | Duration | Annotated |
---|---|---|---|---|---|
F01 | ER01train | 32 | 97 | 2'19" | yes |
F02 | ER01train | 31 | 314 | 11'58" | yes |
F02 | ID01train | 32 | 364 | 30'02" | yes |
F02 | ID01test | 20 | 143 | 9'58" | yes |
M01 | ER01train | 31 | 230 | 6'34" | yes |
M02 | ER01train | 31 | 130 | 3'16" | yes |
M02 | ID01test | 40 | 1571 | 1h44'44" | yes |
M02 | ID01train | 47 | 5807 | 6h29'40" | yes |
M03 | ER01train | 12 | 114 | 2'47" | yes |
M03 | ID01train | 25 | 472 | 36'41" | yes |
M03 | ID01test | 14 | 133 | 11'05" | yes |
TOTAL | 131 | 9360 | 10h07'32" |
The homeService corpus v1.0
The homeService corpus v1.0 is the first release of the audio recorded within the homeService project and it consists of audio recordings of dysarthric speech from 5 different subjects (three male, two female).
Speaker | Type of data | Vocabulary | Number of interactions | Duration | Annotated |
---|---|---|---|---|---|
F01 | ER01train | 32 | 97 | 2'19" | yes |
F02 | ER01train | 31 | 314 | 11'58" | yes |
F02 | ID01train | 30 | 314 | 25'52" | yes |
F02 | ID01test | 16 | 85 | 5'40" | yes |
M01 | ER01train | 31 | 230 | 6'34" | yes |
M02 | ER01train | 31 | 130 | 3'16" | yes |
M02 | ID01test | 40 | 1571 | 1h44'44" | yes |
M02 | ID01train | 47 | 5807 | 6h29'40" | yes |
M03 | ER01train | 12 | 114 | 2'47" | yes |
M03 | ID01train | 18 | 169 | 11'26" | yes |
M03 | ID01test | 11 | 36 | 3'00" | yes |
TOTAL | 131 | 8867 | 9h27'20" | yes |
Each subject’s set is composed by two subsets: enrolment data (ER) and interaction data (ID).
- ER is obtained by the user reading lists of the words that they have chosen as commands in their system. To match the acoustic conditions in user’s home, the recording takes place in the same environment in which the system is supposed to function. As the user is reading from a list, the resulting speech will be less natural but is still effective for initial training.
- ID is recorded as the user operates the electronic devices in his/her house with the homeService speech enabled interface. Recording starts after the user presses a switch and the microphone is open for a predefined number of seconds. In contrast with the ER data, each produced word is chosen by the user autonomously.
Project team
Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain
Data example
Annotation
Annotation provided in HTK STM format
Filename Mic SpeakerID startTime endTime <Mic,SesId,Lang,Impair,level,intel,purpose> Transcription
hom-F01ER01MCW0000003000003 MC F01 0.00 2.55 <MC,ER01,GBEng,CP,SE,LL,a55,ER01train> delete
hom-M02ID01MC20150309104753 MC M02 0.00 3.00 <MC,ID01,GBEng,MND,MO,MM,a75,ID01train> skysportone
Audio
Audio data is provided in the standard MS-WAVE mono format at 16kHz and 16 bit. It was recorded with a 6-channel Microcone microphone array at 48kHz sampling rate and 32bit definition (these streams are available but not distributed in the current release). The 16 kHz signal is the result of the beam-formed combination of the 6 channels which is embedded in the Microcone hardware.
All audio (ER and ID) was recorded in real home environment.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
An agreement with University of Sheffield has to be signed to use the data.
Due to the sensitive nature of the data and the obligation to participant confidentiality, the audio of the homeService corpus cannot be redistributed under any circumstance.
Download
To download the homeService corpus please send a request to spandh-resource@sheffield.ac.uk
Citation
M. Nicolao, H. Christensen, S. Cunningham, P. Green, and T. Hain, The homeService corpus v. 1.0, University of Sheffield at http://mini.dcs.shef.ac.uk/resources/homeservice-corpus, 2016, doi: 10.15131/shef.data.3116833