Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на этот ресурс: http://hdl.handle.net/11701/21170
Полная запись метаданных
Поле DCЗначениеЯзык
dc.contributor.authorKuznetsov, Sergey A.-
dc.contributor.authorSkrebtsova, Tatiana G.-
dc.contributor.authorSuvorov, Sergey G.-
dc.contributor.authorKlementyeva, Anna V.-
dc.date.accessioned2020-12-02T10:12:15Z-
dc.date.available2020-12-02T10:12:15Z-
dc.date.issued2019-
dc.identifier.otherhttps://doi.org/10.21638/11701/9785288059278.01-
dc.identifier.urihttp://hdl.handle.net/11701/21170-
dc.description.abstractOver the last decades, the challenging problem of natural language processing has received much attention from scholars of linguistics and computer science. However, it still remains a topical issue. Despite recent advancements in particular areas, the development of new tools and techniques, few frameworks have been created accounting for the overall processing of text and involving all types of linguistic analysis. The book outlines an original approach for the automated analysis of Russian texts aimed at revealing their information structure. The linguistic analyzer devised by the authors begins with an unstructured Russian text and, after performing graphematical, morphological, syntactic, and communicative analyses, transforms it into a metalinguistic network. No restrictions on the text type are imposed, except that it should be written in the standard Russian language. The authors present a pilot study and do not aim to give a full account of the linguistic analyzer. The focus is primarily on the linguistic aspects of its work rather than on the particular algorithms. Furthermore, they put forward certain fundamental principles that cut across text processing at different stages and make the proposed approach consistent and well-integrated. These principles constitute the theoretical background of the book and are stated below. As has already been mentioned, the analyzer is supposed to operate on various text types, regardless of their style and genre, and process them from the lowest level of individual characters up to the whole discourse. Normally, natural language processing is limited to a particular text type, task, or kind of analysis. Thus, the proposed approach is much wider in scope than the majority of projects in the field. A central feature of the proposed approach is the principal idea of relying solely on the linguistic information contained in text. The analyzer does not resort to probability calculations or statistical methods. While admitting the usefulness of mathematical apparatus for the narrow-scope linguistic applications, the authors believe that a comprehensive interpretation of text requires qualitative rather than quantitative techniques. Thus, the emphasis is placed on a thorough linguistic analysis. After all, human beings do not have to calculate probability in order to understand natural-language texts. The work of the linguistic analyzer encompasses a number of stages roughly corresponding to the book chapters. It begins with graphematical analysis of individual characters, and proceeds to morphological, semantic, syntactic, and communicative analyses. The latter includes quite a number of text organization aspects, including those of pragmatics and rhetorical structure. A major focus is on the constructional syntactic analysis. In the processing, a number of dictionaries are activated, either compiled from scratch or adjusted and augmented by the authors. In particular, there are dictionaries of graphemes, abbreviations, fixed phrases as well as morphological and semantic-syntactic dictionary. ix Linguistic analysis consists in the interpretation of information contained in the verbal component of the text. As a result, units are assigned values of relevant features. This is done in a strictly progressive manner, beginning with the lowest-rank graphematical characters up to the highest-rank syntactic groupings, viz. group of subject, group of predicate, object groups and those of adverbials. Characters (graphemes) are specified in terms of their type (with the values letter / non-letter character, the former being further characterized as belonging to a particular writing system and the latter being subdivided into number / non-number characters). Morphological analyzer assigns word forms values of grammatical categories relevant for the Russian language (number: singular / plural, genus: masculine / feminine / neuter, aspect: perfect / imperfect, tense: past / present / future, etc.). At the stage of syntactic analysis, values of semantic features are added on the basis of semantic-syntactic dictionary as well as values of syntactic ones, specifying the role of the word or phrase in the sentence. Thus, information extracted at every processing stage is uniformly expressed by tags capturing features and their values. It should be emphasized that very little useful information concerning relevant features and their values can be automatically extracted from the existing dictionaries and reference books, as they are not oriented towards natural language processing. The algorithm uses rules of if… then… format, and no traditional books on language present information in this way. So, much empirical work had to be done to both re-formulate the existing knowledge and bring out more specific regularities. Russian National Corpus containing vast amount of current usage was a valuable resource the authors used for that purpose. The whole procedure proved to be effort- and time-consuming. The book discusses at length classical problems of computational linguistics from the viewpoint of the analyzing algorithms and illustrates them with samples found in a variety of fictional, mass-media, legal, etc. texts. Among such problems, ambiguity resolution stands out most sharply, plaguing all natural-language processing endeavors at all levels of analysis. With respect to the analyzer concerned, ambiguity resolution would mean picking up a single interpretation of feature values out of a whole range of options. Unlike some other computational models, the analyzer is not supposed to cope with ambiguity as soon as possible, at the very stage when it occurs. Rather, ambiguity resolution is postponed till the moment when it can be safely dealt with. Thus, morphological ambiguity is usually cancelled at the syntactic stage while reference resolution would require going beyond sentence boundaries. Along with practical computational issues, the book touches upon some global and “eternal” problems, such as What is word? and What is sentence? This was not originally among the authors’ goals and certainly does not constitute a priority. Rather, it is a byproduct of working with real language data and attempting to map them onto the rigid traditional concepts. The ultimate goal of the processing is a metalinguistic network, with objects being assigned certain properties and linked together by predicates. Such network could be used in a wide range of applications bearing on the collection, updating, search, and extraction x of information. However, the very transition from a syntactic structure to a metalinguistic network poses a host of problems, being a largely untrodden ground. It is here that the exploratory character of the study reaches its climax. The authors hope that the discussion in the book will open up new perspectives on the extension of natural language processing beyond the sentence level.en_GB
dc.language.isoruen_GB
dc.publisherSt Petersburg State Universityen_GB
dc.titleIntroductionen_GB
dc.typeBook chapteren_GB
Располагается в коллекциях:LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE

Файлы этого ресурса:
Файл Описание РазмерФормат 
5-15.pdf596,86 kBAdobe PDFПросмотреть/Открыть


Все ресурсы в архиве электронных ресурсов защищены авторским правом, все права сохранены.