Download PDFOpen PDF in browser

Grammatical Disambiguation in the Tatar National Corpus

8 pagesPublished: November 29, 2016

Abstract

This paper concerns the issues of grammatical ambiguity in the Tatar National Corpus and the possiblities for automation of the disambiguation process in the corpus. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. In order to build the grammatically disambiguated subcorpus, wе have developed a special software module which searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the formal disambiguation rules for different ambiguity types. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on statistical corpus data in the Tatar language for the first time. We can say that we use the corpus as a source of our research and at the same time as a destination for implementing the results. Estimated cumulative effect of disambiguation of the identified frequent ambiguity types in the Tatar National Corpus can be up to 50%.

Keyphrases: agglutinative languages, disambiguation, linguistic corpus, morphology, Tatar language

In: Antonio Moreno Ortiz and Chantal Pérez-Hernández (editors). CILC2016. 8th International Conference on Corpus Linguistics, vol 1, pages 228--235

Links:
BibTeX entry
@inproceedings{CILC2016:Grammatical_Disambiguation_in_Tatar,
  author    = {Bulat Khakimov and Ramil Gataullin and Rinat Gilmullin},
  title     = {Grammatical Disambiguation in the Tatar National Corpus},
  booktitle = {CILC2016. 8th International Conference on Corpus Linguistics},
  editor    = {Antonio Moreno Ortiz and Chantal P\textbackslash{}'erez-Hern\textbackslash{}'andez},
  series    = {EPiC Series in Language and Linguistics},
  volume    = {1},
  pages     = {228--235},
  year      = {2016},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-5283},
  url       = {https://easychair.org/publications/paper/RXLM},
  doi       = {10.29007/jkgl}}
Download PDFOpen PDF in browser