Improving Acoustic Models by Watching Television

Witbrock, Michael J.; Hauptmann, Alexander G.

Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.

Document ID

19990045639

Acquisition Source

Headquarters

Document Type

Reprint (Version printed in journal)

Authors

Date Acquired

August 19, 2013

Publication Date

March 19, 1998

Subject Category

Report/Patent Number

Funding Number(s)

Distribution Limits

Public

Work of the US Gov. Public Use Permitted.

Document Inquiry

Available Downloads

There are no available downloads for this record.

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records