Research of the problem of audio stream real-time diarization
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
В работе описывается исследование задачи диаризации аудиопотока в режиме реального времени. Автором рассмотрены основные подходы к её решению и описаны значимые характеристики и методы, применяемые для этого. Произведён анализ доступных программных решений, описаны их преимущества и недостатки. Основное внимание было уделено возможностям диаризации аудиопотока автономными алгоритмами с возможностью запуска на низкопроизводительных устройствах, например, элементах умного дома. Предложенное решение использует возможности нейронных сетей и алгоритма спектральной кластеризации для выделения речи из аудиопотока и идентификации на нём голосов уникальных дикторов. Также приводится архитектура реализованной программы с описанием используемых библиотек и представлены результаты работы на подготовленном наборе тестовых данных. Результаты проведённых экспериментов были анализированы и критически оценены.
This paper describes a research of the problem of audio stream speaker diarization in real time. The author considers the main approaches to its solution and describes relevant characteristics and methods used for this purpose. The available software solutions are analyzed, their advantages and disadvantages are described. The main attention was given to the possibilities of audio stream diarization by autonomous algorithms with the ability to run on low-performance devices, such as elements of a smart home. The proposed solution uses the capabilities of neural networks and spectral clustering algorithm to separate speech from the audio stream and identify the voices of unique speakers on it. Architecture of the implemented program with description of used libraries is given and results of work on the prepared set of test data are presented. The results of the conducted experiments have been analyzed and critically evaluated.
This paper describes a research of the problem of audio stream speaker diarization in real time. The author considers the main approaches to its solution and describes relevant characteristics and methods used for this purpose. The available software solutions are analyzed, their advantages and disadvantages are described. The main attention was given to the possibilities of audio stream diarization by autonomous algorithms with the ability to run on low-performance devices, such as elements of a smart home. The proposed solution uses the capabilities of neural networks and spectral clustering algorithm to separate speech from the audio stream and identify the voices of unique speakers on it. Architecture of the implemented program with description of used libraries is given and results of work on the prepared set of test data are presented. The results of the conducted experiments have been analyzed and critically evaluated.