Please use this identifier to cite or link to this item: http://hdl.handle.net/11701/21174
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKuznetsov, Sergey A.-
dc.contributor.authorSkrebtsova, Tatiana G.-
dc.contributor.authorSuvorov, Sergey G.-
dc.contributor.authorKlementyeva, Anna V.-
dc.date.accessioned2020-12-02T10:20:45Z-
dc.date.available2020-12-02T10:20:45Z-
dc.date.issued2019-
dc.identifier.otherhttps://doi.org/10.21638/11701/9785288059278.02-
dc.identifier.urihttp://hdl.handle.net/11701/21174-
dc.description.abstractGraphematical analysis marks the first stage of text processing. However, prior to it, basic text structuring takes place, resulting in the identification of paragraphs and their types, e.g. title, subtitle, author name(s), chapter and section titles, footnotes, endnotes, figures, appendices, epigraphs, etc. After that, graphematical analysis proper begins. Its aim is to decompose the flow of letter and non-letter graphemes into character strings such as individual words, abbreviations, numbers, and hybrid strings (e.g. mathematical formulae). The procedure implies an iterative process of unit assembling, from individual characters to what is called atoms, next to tokens (roughly equivalent to word occurrences), sentence parts and finally, a whole sentence. At every stage, each unit is assigned its type. Assembling relies on the rules based solely on a thorough structural analysis of context. No formal models or statistical methods are applied, this being a central principle of the linguistic analyzer, inherent in all its algorithms. At this stage, complications arise primarily through the ambiguity of punctuation marks. They are discussed at length throughout the chapter.en_GB
dc.language.isoruen_GB
dc.publisherSt Petersburg State Universityen_GB
dc.subjectgraphemeen_GB
dc.subjectletter graphemeen_GB
dc.subjectnon-letter graphemeen_GB
dc.subjectcharacter stringen_GB
dc.subjectpunctuation marken_GB
dc.titleChapter 1. Graphematical analysisen_GB
dc.typeBook chapteren_GB
Appears in Collections:LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE

Files in This Item:
File Description SizeFormat 
16-26.pdf626,03 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.