Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс:
http://hdl.handle.net/11701/21176
Полная запись метаданных
Поле DC | Значение | Язык |
---|---|---|
dc.contributor.author | Kuznetsov, Sergey A. | - |
dc.contributor.author | Skrebtsova, Tatiana G. | - |
dc.contributor.author | Suvorov, Sergey G. | - |
dc.contributor.author | Klementyeva, Anna V. | - |
dc.date.accessioned | 2020-12-02T10:25:41Z | - |
dc.date.available | 2020-12-02T10:25:41Z | - |
dc.date.issued | 2019 | - |
dc.identifier.other | https://doi.org/10.21638/11701/9785288059278.03 | - |
dc.identifier.uri | http://hdl.handle.net/11701/21176 | - |
dc.description.abstract | The aim of this step is to check the previously assigned token types and to provide the letter tokens with morphological information, i.e. values of the relevant grammatical categories. As far as word forms are concerned, the procedure is normally called morphological analysis. However, since a text may contain other token types (e.g. abbreviations, formulae, Internet hyperlinks, phone numbers), it is generally referred to as token attribution. In the chapter, a wide range of token types are considered from the processing viewpoint. In particular, the letter token analysis presupposes a search in a number of dictionaries. Apart from a regular Russian morphological dictionary, search is also performed in the dictionaries of abbreviations, of fixed phrases, and of proper names (personal, geographical, etc.). Morphological analysis often yields more than a single attribution. In some cases, the ambiguity can be reduced by taking into account graphematical information, but most often, it will remain and pose further problems for the syntactic analysis. If the search for a letter token fails in all the dictionaries, the algorithm tries to identify its lemma and predict the grammatical meaning. The attribution of other token types is made by mapping them onto a range of patterns. Typical problems of both operations are discussed. | en_GB |
dc.language.iso | ru | en_GB |
dc.publisher | St Petersburg State University | en_GB |
dc.subject | morphological analysis | en_GB |
dc.subject | tokenization | en_GB |
dc.subject | abbreviation | en_GB |
dc.subject | fixed phrase | en_GB |
dc.subject | proper name | en_GB |
dc.subject | morphological prediction | en_GB |
dc.title | Chapter 2. Token attribution | en_GB |
dc.type | Book chapter | en_GB |
Располагается в коллекциях: | LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE |
Файлы этого ресурса:
Файл | Описание | Размер | Формат | |
---|---|---|---|---|
27-43.pdf | 645,79 kB | Adobe PDF | Просмотреть/Открыть |
Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.