ICAME41: ICAME41 Pre-conference workshop on "Issues with corpus data" Heidelberg University Heidelberg, Germany, May 20, 2020 |
Conference website | https://icame41.as.uni-heidelberg.de/workshop-issues-with-corpus-data-small-niggling-everyday-problems-or-something-to-be-genuinely-concerned-about/ |
Abstract registration deadline | December 15, 2019 |
Submission deadline | December 15, 2019 |
ICAME41 Pre-conference workshop at Heidelberg University, May 20, 2020
Issues with corpus data: small, niggling, everyday problems - or something to be genuinely concerned about?
WORKSHOP DESCRIPTION:
Finding unwanted items among search results is probably a very typical, almost everyday experience for many corpus linguists. Corpus compilers spend considerable effort regarding corpus annotation, markup, boilerplate removal, identification of duplicates or OCR errors. Similarly, scholars using corpora use increasingly refined methods when constructing elaborate query strings. Yet, achieving perfect precision and/or recall is still highly unlikely. We often need to exclude different kinds of search hits from further analyses, and the reasons for weeding out unwanted items can be varied. Occasionally the occurrence of false positives is mentioned in research articles, perhaps in a footnote, but it may also be the case that much of the clean-up of irrelevant items is done silently.
There are undoubtedly many types of persistent problems in corpus data that seasoned scholars have encountered and know about, but which are seldom specifically addressed. Yet beginning corpus users might benefit from learning about what may be regarded as tacit knowledge in corpus linguistics, and even the more advanced scholar may encounter issues new to them that have been addressed earlier. This workshop intends to tap into this knowledge by inviting papers on the following topics:
- typical false positives found in corpora; how to find them or assess their frequency in a corpus?
- the significance of identifying different types of unwanted items; how to deal with them and what are the risks if they are not identified?
- problems associated with categories built into corpus design and various types of linguistic annotation in corpora; to what extent can these seemingly helpful features encourage uncritical thinking or guide corpus users research?
It deserves to be mentioned that problematic aspects may be detected in individual corpora, and observing such infelicities is without question necessary and useful as the aim of such observations is to advance corpus linguistic endeavours on the whole. However, instead of focusing on corpus-specific issues, this workshop welcomes papers that reflect on general issues or their own experiences of, and mistakes in, corpus compiling and corpus-based research. In the collegial spirit of ICAME, this workshop is not intended as a forum for highlighting mistakes or shortcomings in fellow scholars’ work.
The estimated number of participants is 6-8, which means that the workshop would take roughly four to five hours. The conveners of the workshop are planning an edited volume based on the papers presented at the workshop.
Submission Guidelines
Presenters are invited to submit abstracts for the workshop; the length of the abstract should be between 400 and 500 words in length (excluding references) and both full papers and work-in-progress reports are welcome (20 minutes for the presentations + 10 minutes for discussion). The deadline for abstract submission is 15th December 2019. Notifications of acceptance will be sent out by January 10, 2020.
Workshop convenors
- Mark Kaunisto (Tampere University)
- Marco Schilk (Universität Hildesheim)
- Jukka Tyrkkö (Linnæus University)
Venue
The ICAME41 conference will be held in Heidelberg University, Germany, May 20-24, 2020 (see https://icame41.as.uni-heidelberg.de/). The workshops page of the conference is at https://icame41.as.uni-heidelberg.de/workshops/.
Contact
All questions about submissions should be emailed to Mark Kaunisto (mark.kaunisto@tuni.fi)