Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://hdl.handle.net/11701/21176
Полная запись метаданных
Поле DCЗначениеЯзык
dc.contributor.authorKuznetsov, Sergey A.-
dc.contributor.authorSkrebtsova, Tatiana G.-
dc.contributor.authorSuvorov, Sergey G.-
dc.contributor.authorKlementyeva, Anna V.-
dc.date.accessioned2020-12-02T10:25:41Z-
dc.date.available2020-12-02T10:25:41Z-
dc.date.issued2019-
dc.identifier.otherhttps://doi.org/10.21638/11701/9785288059278.03-
dc.identifier.urihttp://hdl.handle.net/11701/21176-
dc.description.abstractThe aim of this step is to check the previously assigned token types and to provide the letter tokens with morphological information, i.e. values of the relevant grammatical categories. As far as word forms are concerned, the procedure is normally called morphological analysis. However, since a text may contain other token types (e.g. abbreviations, formulae, Internet hyperlinks, phone numbers), it is generally referred to as token attribution. In the chapter, a wide range of token types are considered from the processing viewpoint. In particular, the letter token analysis presupposes a search in a number of dictionaries. Apart from a regular Russian morphological dictionary, search is also performed in the dictionaries of abbreviations, of fixed phrases, and of proper names (personal, geographical, etc.). Morphological analysis often yields more than a single attribution. In some cases, the ambiguity can be reduced by taking into account graphematical information, but most often, it will remain and pose further problems for the syntactic analysis. If the search for a letter token fails in all the dictionaries, the algorithm tries to identify its lemma and predict the grammatical meaning. The attribution of other token types is made by mapping them onto a range of patterns. Typical problems of both operations are discussed.en_GB
dc.language.isoruen_GB
dc.publisherSt Petersburg State Universityen_GB
dc.subjectmorphological analysisen_GB
dc.subjecttokenizationen_GB
dc.subjectabbreviationen_GB
dc.subjectfixed phraseen_GB
dc.subjectproper nameen_GB
dc.subjectmorphological predictionen_GB
dc.titleChapter 2. Token attributionen_GB
dc.typeBook chapteren_GB
Располагается в коллекциях:LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE

Файлы этого ресурса:
Файл Описание РазмерФормат 
27-43.pdf645,79 kBAdobe PDFПросмотреть/Открыть


Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.