Terminological Information Extraction from Russian Scientific Texts: Methods and Applications

12 pagesPublished: March 18, 2019


Scientific texts contain a lot of special terms, which together with their definitions present an important part of scientific knowledge to be extract- ed for various applications, such as text summarization, construction of glossaries and ontologies and so on. The paper reports rule-based methods developed for extracting terminological information involving recognition of term definitions, as well as detection of term occurrences within scientific or technical texts. In contrast to corpus-based terminology extraction, the developed methods are oriented to processing a single text and are based on lexico-syntactic patterns and rules representing specific linguistic information about terms in scientific texts. The formal language LSPL for specification of the patterns and rules is briefly characterized, which is supported with programming tools and used for information extraction. Two applications of the methods are discussed: formation of glossary for a given text document and subject index construction. For these applications, both collections of LSPL patterns and extraction strategies are described, and results of their experimental evaluation are given.

Keyphrases: extraction of term definition, glossary creation, lexico-syntactic patterns, Rule-based term extraction, subject index construction

In: Gerhard Wohlgenannt, Ruprecht von Waldenfels, Svetlana Toldova, Ekaterina Rakhilina, Denis Paperno, Olga Lyashevskaya, Natalia Loukachevitch, Sergei O. Kuznetsov, Olga Kultepina, Dmitry Ilvovsky, Boris Galitsky, Ekaterina Artemova and Elena Bolshakova (editors). Proceedings of Third Workshop "Computational linguistics and language science", vol 4, pages 95--106

