Chapter 2. Token attribution

Kuznetsov, Sergey A.; Skrebtsova, Tatiana G.; Suvorov, Sergey G.; Klementyeva, Anna V.

Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://hdl.handle.net/11701/21176

Полная запись метаданных

Поле DC	Значение	Язык
dc.contributor.author	Kuznetsov, Sergey A.	-
dc.contributor.author	Skrebtsova, Tatiana G.	-
dc.contributor.author	Suvorov, Sergey G.	-
dc.contributor.author	Klementyeva, Anna V.	-
dc.date.accessioned	2020-12-02T10:25:41Z	-
dc.date.available	2020-12-02T10:25:41Z	-
dc.date.issued	2019	-
dc.identifier.other	https://doi.org/10.21638/11701/9785288059278.03	-
dc.identifier.uri	http://hdl.handle.net/11701/21176	-
dc.description.abstract	The aim of this step is to check the previously assigned token types and to provide the letter tokens with morphological information, i.e. values of the relevant grammatical categories. As far as word forms are concerned, the procedure is normally called morphological analysis. However, since a text may contain other token types (e.g. abbreviations, formulae, Internet hyperlinks, phone numbers), it is generally referred to as token attribution. In the chapter, a wide range of token types are considered from the processing viewpoint. In particular, the letter token analysis presupposes a search in a number of dictionaries. Apart from a regular Russian morphological dictionary, search is also performed in the dictionaries of abbreviations, of fixed phrases, and of proper names (personal, geographical, etc.). Morphological analysis often yields more than a single attribution. In some cases, the ambiguity can be reduced by taking into account graphematical information, but most often, it will remain and pose further problems for the syntactic analysis. If the search for a letter token fails in all the dictionaries, the algorithm tries to identify its lemma and predict the grammatical meaning. The attribution of other token types is made by mapping them onto a range of patterns. Typical problems of both operations are discussed.	en_GB
dc.language.iso	ru	en_GB
dc.publisher	St Petersburg State University	en_GB
dc.subject	morphological analysis	en_GB
dc.subject	tokenization	en_GB
dc.subject	abbreviation	en_GB
dc.subject	fixed phrase	en_GB
dc.subject	proper name	en_GB
dc.subject	morphological prediction	en_GB
dc.title	Chapter 2. Token attribution	en_GB
dc.type	Book chapter	en_GB
Располагается в коллекциях:	LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE

Файлы этого ресурса:

Файл	Описание	Размер	Формат
27-43.pdf		645,79 kB	Adobe PDF	Просмотреть/Открыть

Показать базовое описание ресурса Просмотр статистики

Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.

Архив открытого доступаСанкт-Петербургского государственного университета

Архив открытого доступа
Санкт-Петербургского государственного университета