The homeService Corpus

The homeService Corpus

The homeService corpus is a new English speech database which has been gathered as part of the homeService project. The homeService project is the impact showcase for the UK EPSRC Programme Grant Project, Natural Speech Technology (NST) a collaboration between the Universities of Edinburgh, Cambridge and Sheffield and it is concerned with how speech technology can be of use for people with speech disorders and restricted upper-limb mobility.

The audio recorded during such interactions consists of realistic speech data of speakers with severe dysarthria. The audio recorded during such interactions consists of realistic data of speakers with severe dysarthria. The majority of the homeService corpus is recorded in real home environments where voice control is often the normal means by which users interact with their devices.

sponsor-hiresnst

 

 

 

The homeService corpus v1.1

The homeService corpus v1.1 is the second release of the audio recorded within the homeService project and it consists of audio recordings of dysarthric speech from 5 different subjects (three male, two female).

SpeakerType of dataVocabularyNumber of interactionsDurationAnnotated
F01ER01train32972'19"yes
F02ER01train3131411'58"yes
F02ID01train3236430'02"yes
F02ID01test201439'58"yes
M01ER01train312306'34"yes
M02ER01train311303'16"yes
M02ID01test4015711h44'44"yes
M02ID01train4758076h29'40"yes
M03ER01train121142'47"yes
M03ID01train2547236'41"yes
M03ID01test1413311'05"yes
TOTAL131936010h07'32"

The homeService corpus v1.0

The homeService corpus v1.0 is the first release of the audio recorded within the homeService project and it consists of audio recordings of dysarthric speech from 5 different subjects (three male, two female).

SpeakerType of dataVocabularyNumber of interactionsDurationAnnotated
F01ER01train32972'19"yes
F02ER01train3131411'58"yes
F02ID01train3031425'52"yes
F02ID01test16855'40"yes
M01ER01train312306'34"yes
M02ER01train311303'16"yes
M02ID01test4015711h44'44"yes
M02ID01train4758076h29'40"yes
M03ER01train121142'47"yes
M03ID01train1816911'26"yes
M03ID01test11363'00"yes
TOTAL13188679h27'20"yes

 

Each subject’s set is composed by two subsets: enrolment data (ER) and interaction data (ID).

  • ER is obtained by the user reading lists of the words that they have chosen as commands in their system. To match the acoustic conditions in user’s home, the recording takes place in the same environment in which  the system is supposed to function. As the user is reading from a list, the resulting speech will be less natural but is still effective for initial training.
  • ID is recorded as the user operates the electronic devices in his/her house with the homeService speech enabled interface. Recording starts after the user presses a switch and the microphone is open for a predefined number of seconds. In contrast with the ER data, each produced word is chosen by the user autonomously.

Project team

Mauro Nicolao, Heidi Christensen, Stuart Cunningham, Phil Green, Thomas Hain

Data example

Annotation

Annotation provided in HTK STM format

Filename Mic SpeakerID startTime endTime <Mic,SesId,Lang,Impair,level,intel,purpose> Transcription

hom-F01ER01MCW0000003000003 MC F01 0.00 2.55 <MC,ER01,GBEng,CP,SE,LL,a55,ER01train> delete
hom-M02ID01MC20150309104753 MC M02 0.00 3.00 <MC,ID01,GBEng,MND,MO,MM,a75,ID01train> skysportone

Audio

Audio data is provided in the standard MS-WAVE mono format at 16kHz and 16 bit.  It was recorded with a 6-channel Microcone microphone array at 48kHz sampling rate and 32bit definition (these streams are available but not distributed in the current release). The 16 kHz signal is the result of the beam-formed combination of the 6 channels which is embedded in the Microcone hardware.

All audio (ER and ID) was recorded in real home environment.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

An agreement with University of Sheffield has to be signed to use the data.

Due to the sensitive nature of the data and the obligation to participant confidentiality, the audio of the homeService corpus cannot be redistributed under any circumstance.

Download

To download the homeService corpus please send a request to spandh-resource@sheffield.ac.uk

Figshare link

Download page

Citation

M. Nicolao, H. Christensen, S. Cunningham, P. Green, and T. Hain, The homeService corpus v. 1.0, University of Sheffield at http://mini.dcs.shef.ac.uk/resources/homeservice-corpus, 2016, doi: 10.15131/shef.data.3116833

Projects

Publications

Sorry, no publications matched your criteria.

Back to Top