Building Corpus-Based Semantic Classifications of Some Tatar Affixes

10 pagesPublished: November 28, 2016


This study is aimed at exploring the semantic properties of Tatar affixes. Turkic languages have complicated morphology and syntax, which is a challenge for language processing.
The fundamental principle of inflection and derivation in Tatar, as well as in other Turkic languages, is agglutination, when the stem joins postpositive affixes in a strictly determined order.
The Tatar language has affixes of different types:
a) derivational affixes expressing only lexical meaning and forming new words;
b) inflectional affixes changing the word form (for example, case affixes);
c) affixes serving as means of derivation as well as inflection.
The current study is devoted to the ambiguous Tatar –lık polyfunctional affix which may be joined to nominal, adjectival and verbal stems and form derivatives of different types depending on contextual environment, the meaning of the stem and the composition of the affixal chain of a derivative. -Lık affix is a productive affix in modern Tatar which builds nominal, adjectival and verbal derivatives.
The answer to the question of the number of the types of derivatives and word forms produced with -lık affix is not trivial, and different researchers distinguish different types of derivatives.
Based on a thorough analysis of Tatar derivatives containing - lık affix we identified some empirical features of these constructs and then performed their manual and automatic classification. Four classes were distinguished. For our experiments we used data from the Tatar National Corpus “Tugan Tel” (
The results obtained may be used for disambiguation in Tatar National Corpus and for analyzing other Tatar ambiguous affixes.

Keyphrases: affixes, agglutination, grammar in corpora, Tatar language

In: Antonio Moreno Ortiz and Chantal Pérez-Hernández (editors). CILC2016. 8th International Conference on Corpus Linguistics, vol 1, pages 321--330

