Chapter 1. Graphematical analysis

Kuznetsov, Sergey A.; Skrebtsova, Tatiana G.; Suvorov, Sergey G.; Klementyeva, Anna V.

Please use this identifier to cite or link to this item: http://hdl.handle.net/11701/21174

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kuznetsov, Sergey A.	-
dc.contributor.author	Skrebtsova, Tatiana G.	-
dc.contributor.author	Suvorov, Sergey G.	-
dc.contributor.author	Klementyeva, Anna V.	-
dc.date.accessioned	2020-12-02T10:20:45Z	-
dc.date.available	2020-12-02T10:20:45Z	-
dc.date.issued	2019	-
dc.identifier.other	https://doi.org/10.21638/11701/9785288059278.02	-
dc.identifier.uri	http://hdl.handle.net/11701/21174	-
dc.description.abstract	Graphematical analysis marks the first stage of text processing. However, prior to it, basic text structuring takes place, resulting in the identification of paragraphs and their types, e.g. title, subtitle, author name(s), chapter and section titles, footnotes, endnotes, figures, appendices, epigraphs, etc. After that, graphematical analysis proper begins. Its aim is to decompose the flow of letter and non-letter graphemes into character strings such as individual words, abbreviations, numbers, and hybrid strings (e.g. mathematical formulae). The procedure implies an iterative process of unit assembling, from individual characters to what is called atoms, next to tokens (roughly equivalent to word occurrences), sentence parts and finally, a whole sentence. At every stage, each unit is assigned its type. Assembling relies on the rules based solely on a thorough structural analysis of context. No formal models or statistical methods are applied, this being a central principle of the linguistic analyzer, inherent in all its algorithms. At this stage, complications arise primarily through the ambiguity of punctuation marks. They are discussed at length throughout the chapter.	en_GB
dc.language.iso	ru	en_GB
dc.publisher	St Petersburg State University	en_GB
dc.subject	grapheme	en_GB
dc.subject	letter grapheme	en_GB
dc.subject	non-letter grapheme	en_GB
dc.subject	character string	en_GB
dc.subject	punctuation mark	en_GB
dc.title	Chapter 1. Graphematical analysis	en_GB
dc.type	Book chapter	en_GB
Appears in Collections:	LINGUISTIC ANALYZER: AUTOMATIC TRANSFORMATION OF NATURAL LANGUAGE TEXTS INTO INFORMATION DATA STRUCTURE

Files in This Item:

File	Description	Size	Format
16-26.pdf		626,03 kB	Adobe PDF	View/Open

Show simple item record

Research RepositorySaint Petersburg State University

Research Repository
Saint Petersburg State University