Research of the problem of audio stream real-time diarization

Попов Никита Алексеевич; Popov Nikita Alekseevic

Research of the problem of audio stream real-time diarization

Files

diploma.pdf (11.35 MB)

reviewSV_otzyv_rukovoditelja_angl.docx (35.63 KB)

reviewSV_otzyv_rukovoditelja.docx (33.37 KB)

Date

2020

Authors

Попов Никита Алексеевич

Popov Nikita Alekseevic

Abstract

В работе описывается исследование задачи диаризации аудиопотока в режиме реального времени. Автором рассмотрены основные подходы к её решению и описаны значимые характеристики и методы, применяемые для этого. Произведён анализ доступных программных решений, описаны их преимущества и недостатки. Основное внимание было уделено возможностям диаризации аудиопотока автономными алгоритмами с возможностью запуска на низкопроизводительных устройствах, например, элементах умного дома. Предложенное решение использует возможности нейронных сетей и алгоритма спектральной кластеризации для выделения речи из аудиопотока и идентификации на нём голосов уникальных дикторов. Также приводится архитектура реализованной программы с описанием используемых библиотек и представлены результаты работы на подготовленном наборе тестовых данных. Результаты проведённых экспериментов были анализированы и критически оценены.
This paper describes a research of the problem of audio stream speaker diarization in real time. The author considers the main approaches to its solution and describes relevant characteristics and methods used for this purpose. The available software solutions are analyzed, their advantages and disadvantages are described. The main attention was given to the possibilities of audio stream diarization by autonomous algorithms with the ability to run on low-performance devices, such as elements of a smart home. The proposed solution uses the capabilities of neural networks and spectral clustering algorithm to separate speech from the audio stream and identify the voices of unique speakers on it. Architecture of the implemented program with description of used libraries is given and results of work on the prepared set of test data are presented. The results of the conducted experiments have been analyzed and critically evaluated.

Keywords

диаризация аудиопотока, распознавание речи, идентификация дикторов, speaker diarization, speech recognition, speaker identification

URI

http://hdl.handle.net/11701/26357

Collections

MASTER'S STUDIES

Full item page

Research of the problem of audio stream real-time diarization

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By