Download PDFOpen PDF in browser

Modeling Non-Compositional Expressions using a Search Engine

EasyChair Preprint no. 418

6 pagesDate: August 9, 2018


Non-compositional multi-word expressions present great challenges to natural language processing applications. In this paper, we present a method for modeling non-compositional expressions based on the assumption that the meaning of expressions depends on context. Therefore, context words can be used to select documents and separate documents where the expression has different meanings. Deviation from a baseline is measured using serendipity (i.e. the pointwise effect size). We used this statistical measure to mark which patterns are over- and under-represented and to take a decision if the pattern under scrutiny belongs to the meaning selected by the context words or not. We used the Google search engine to find document frequency estimates. When used with Google document frequency estimates, the serendipity measure closely mirrors some human intuitions on the preferred alternative.

Keyphrases: compositional meaning, compositional multi word expression, computational linguistic, conjunction fallacy, Context Word, effect size, expected frequency, Frequency Machine, memory-based learning, multiword expressions, Natural Language Processing, non compositional expression, Non-compositional, non-compositional meaning, search engine, Serendipity, statistics

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Cheikh Bamba Dione and Christer Johansson},
  title = {Modeling Non-Compositional Expressions using a Search Engine},
  howpublished = {EasyChair Preprint no. 418},
  doi = {10.29007/4jl9},
  year = {EasyChair, 2018}}
Download PDFOpen PDF in browser