One of the hardest problems in sentiment analysis is an analysis of multiple targets in the same text, without clear boundaries between target contexts. This problem is even harder for Ukrainian and Russian languages because of the lack of datasets and established approaches. Responding to the business needs of our company, we created the bilingual dataset, manually annotated for targeted sentiment according to strict guidelines. This dataset allowed us to fine-tune a pre-trained multilingual BERT model and to achieve large improvement in key metrics (macro F1, F1 for negative and positive classes) over the baseline models. As a by-product, we have trained new NER models for both languages. NER and targeted sentiment models were successfully introduced into the production environment at Semantrum.
Targeted Sentiment Analysis for Ukrainian and Russian News Articles