Development of speech recognition system for indexing and searching in a big collection of mediafiles
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
На сегодняшний день коммерческие системы распознавания русской речи достигли сравнимого с человеком уровня распознавания в 90-95%. В то же время, практически отсутствуют решения для русского языка с открытым исходным кодом на основе современных архитектур. Основная проблема заключается в отсутствии достаточно объемных открытых корпусов транскрибированной русской речи. В данной работе предложен метод автоматического создания корпусов объемом в несколько сотен часов речи и рассмотрен процесс создания системы распознавания речи на основе открытой реализации архитектуры DeepSpeech. Кроме того, в работе рассматривается применение построенной модели для создания системы поиска по речи в коллекции медиафайлов.
To date, commercial systems for recognizing Russian speech have achieved 90-95% accuracy which is comparable to human level. At the same time, there are practically no open source solutions for Russian speech recognition based on modern architectures. The main reason is the lack of large enough public datasets of transcribed Russian speech. This paper proposes a method for automatic dataset crawling, resulting in datasets containing several hundred hours of speech, and describes ASR system creation based on the open source implementation of DeepSpeech architecture. In addition, the paper considers the application of the implemented model to create a search system for speech in the collection of media files.
To date, commercial systems for recognizing Russian speech have achieved 90-95% accuracy which is comparable to human level. At the same time, there are practically no open source solutions for Russian speech recognition based on modern architectures. The main reason is the lack of large enough public datasets of transcribed Russian speech. This paper proposes a method for automatic dataset crawling, resulting in datasets containing several hundred hours of speech, and describes ASR system creation based on the open source implementation of DeepSpeech architecture. In addition, the paper considers the application of the implemented model to create a search system for speech in the collection of media files.