Download PDFOpen PDF in browser

A Language Independent Approach to Multilingual Document Representation Including Arabic

EasyChair Preprint 6786

8 pagesDate: October 6, 2021

Abstract

Arabic language has become an increasing interest in the field of Multilingual Information Retrieval (MIR). We deal in this work with the problem of multilingual document representation. The proposed approach combines a surface analysis and the Latent Semantic Analysis (LSA) algorithm in a new way to break the terms of LSA down into units which correspond more closely to morphemes. These morphemes are the variable length character n-gram candidates extracted from different fragments separated by borders. The obtained results are encouraging and variability shows that they are perfectible.

Keyphrases: concept types, multilingual document representation, multilingual information retrieval (MIR), pivot language, principle of border, variable length character N-grams, virtual document

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:6786,
  author    = {Souhila Boucham and Hassina Aliane},
  title     = {A Language Independent Approach to Multilingual Document Representation Including Arabic},
  howpublished = {EasyChair Preprint 6786},
  year      = {EasyChair, 2021}}
Download PDFOpen PDF in browser