Factored WSJCAM0 speech corpus

Factored WSJCAM0 speech corpus

This version of the WSJCAM0 corpus has augmented variability in 4 factors: speaker; channel; background and Signal-to-Noise Ratio (SNR). It has been created by the University of Sheffield for experiments in robustness and factorisation under non-stationary noise conditions. The corpus was developed as part of the Natural Speech Technology programme grant (EP/I031022/1). Files are single-channel WAVE format, sampled at 16kHz and with a bit depth of 16 bits.

In order to obtain this corpus, you are required to have a valid licence of the original WSJCAM0 corpus from LDC (catalog number LDC95S24). Please contact a member of the team if you use this corpus for your research.

The corpus is divided in 5 sets:

SetDescriptionSpeakersFiles
si_trSpeaker independent training data867,387
si_dt5aPrimary 5k task development set19331
si_dt5bSecondary 5k task development set19336
si_dt20aPrimary 20k task development set19323
si_dt20bSecondary 20k task development set19329

This is the description of the 4 factors of variability:

SPEAKERS 
c02 c03 c04 c05 c06 c07 c08 c09 c0a c0b c0c c0d c0e c0f c0g c0h c0i c0j c0k c0l c0m c0n c0r c0v c0w c0x c0y c0z c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c1a c1b c1c c1d c1e c1f c1g c1h c1i c1j c1k c1l c1m c1n c1o c1p c1q c1r c1s c1t c1u c1v c1w c1x c1y c1z c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c2a c2b c2c c2d c2e c2f c2g c2h c2i c2j c2k c2lOriginal si_tr speakers
c31 c34 c35 c38 c3c c3d c3f c3j c3k c3l c3p c3s c3t c3w c3z c40 c41 c49Original si_dt5a, si_dt5b, si_dt20a, si_st20b speakers
CHANNELS 
ch0Original head-mounted microphone
ch1Original desk-mounted channel
BACKGROUNDS 
bg0Original clean background
bg1Added orchestral music
bg2Added popular contemporary music
bg3Added traffic noise
bg4Added restaurant noise
bg5Added applause noise
bg6Added outdoors noise
SNR 
SNR055-10 dB
SNR1010-15 dB
SNR1515-20 dB
SNR20+Inf dB

Publications

2014

  • O. Saz and T. Hain, “Using Contextual Information in Joint Factor Eigenspace MLLR for Speech Recognition in Diverse Scenarios,” in Proceedings of the 2014 International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy, 2014, p. 6314–6318.
    [Bibtex]
    @inproceedings{Saz14,
    address = {Florence, Italy},
    author = {Oscar Saz and Thomas Hain},
    booktitle = {{Proceedings of the 2014 International Conference on Acoustic, Speech and Signal Processing (ICASSP)}},
    pages = {6314--6318},
    project = {nst},
    title = {{Using Contextual Information in Joint Factor Eigenspace MLLR for Speech Recognition in Diverse Scenarios}},
    year = {2014}
    }

2013

  • [PDF] O. Saz and T. Hain, “Asynchronous Factorisation of Speaker and Background with Feature Transforms in Speech Recognition,” in Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech), Lyon, France, 2013, p. 1238–1242.
    [Bibtex]
    @inproceedings{Saz13,
    address = {Lyon, France},
    author = {Oscar Saz and Thomas Hain},
    booktitle = {{Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech)}},
    pages = {1238--1242},
    project = {nst},
    title = {{Asynchronous Factorisation of Speaker and Background with Feature Transforms in Speech Recognition}},
    year = {2013}
    }

Back to Top