Textual trends detection at OK

Malyutin, Evgeniy A.; Bugaichenko, Dmitriy Yu.; Mishenin, Alexey N.

Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://hdl.handle.net/11701/8516

Полная запись метаданных

Поле DC	Значение	Язык
dc.contributor.author	Malyutin, Evgeniy A.	-
dc.contributor.author	Bugaichenko, Dmitriy Yu.	-
dc.contributor.author	Mishenin, Alexey N.	-
dc.date.accessioned	2017-10-12T09:57:20Z	-
dc.date.available	2017-10-12T09:57:20Z	-
dc.date.issued	2017-09	-
dc.identifier.citation	Malyutin E. A., Bugaichenko D. Y., Mishenin A. N. Textual trends detection at OK. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2017, vol. 13, iss. 3, pp. 313–325.	en_GB
dc.identifier.other	10.21638/11701/spbu10.2017.308	-
dc.identifier.uri	http://hdl.handle.net/11701/8516	-
dc.description.abstract	Social networks now serve not as a mere medium for entertainment, but as an information distribution channel that is replacing classical mass media. In this article we describe a scalable trend detection system implemented with the social network OK. Actors (users and communities) of social networks form a broad agenda. The content of social networks is specific: • UGC (user generated content) is difficult to process; • actors generate a multilingual text. This requires attracting a large number of highly paid professionals in the case of classical media analysis; • modern social networks comprise a highly-connected society with high “response time”. Therefore, the system must work in real time; • social networks are used by spammers as a platform for promotion and obtrusive advertising, therefore the system should contain the ability to filter spam content. Applying standard methods of media analysis to this seems impossible. It creates a natural demand for developing and implementing textual trend detection and analysis software. There are two main approaches of trend detection in academic papers: topic modeling (and further topics evolutionary analysis) and distributive models based on frequency-like properties of distinct terms. We conducted an analysis of scientific papers using both approaches taking into account the specific features of social networks. As a result of research, it was decided to use distributive models as a base for the system development. OK is one of the largest social networks in Russia and the CIS countries. Actors generate over 100M symbols of text every day. Even basic processing is a serious technical problem. So we are forced to use Big Data approaches through the development. We introduce lambda-architecture based on three main components: • daily-batch processing component, based on Apache Spark; • streaming processing component, based on Apache Samza; • mini-batch processing component, based on Spark Streaming. The article describes in detail the architecture and technical features of each component. In conclusion we present the results of operating the system as well as discuss areas for further research and development. Refs 13. Figs 7. Table 1.	en_GB
dc.language.iso	ru	en_GB
dc.publisher	St Petersburg State University	en_GB
dc.relation.ispartofseries	Vestnik of St Petersburg University. Applied Mathematics. Computer Science. Control Processes;Volume 13; Issue 3	-
dc.subject	natural language processing	en_GB
dc.subject	trend detection	en_GB
dc.subject	big data	en_GB
dc.title	Textual trends detection at OK	en_GB
dc.type	Article	en_GB
Располагается в коллекциях:	Issue 3

Файлы этого ресурса:

Файл	Описание	Размер	Формат
08-Malyutin.pdf		713,62 kB	Adobe PDF	Просмотреть/Открыть

Показать базовое описание ресурса Просмотр статистики

Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.

Архив открытого доступаСанкт-Петербургского государственного университета

Архив открытого доступа
Санкт-Петербургского государственного университета