Program for Friday, September 19th

PROGRAM FOR FRIDAY, SEPTEMBER 19TH

Days:

previous day

all days

View: session overview talk overview

09:00-10:00 Session 17: Invited Talk 3

The third invited talk opens the last day of the conference.

Chair:

Tanya Braun

Location: Lecture Hall H02

09:00

Rineke Verbrugge

Theory of mind - Between natural and artificial intelligence

ABSTRACT. When engaging in social interaction, people rely on their ability to reason about other people’s mental states, including goals, intentions, and beliefs. This theory of mind ability allows them to more easily understand, predict, and even manipulate the behavior of others. People can also use their theory of mind recursively, to reason about the theory of mind of others, that is, to apply higher-order theory of mind, which allows them to understand sentences like “Alice believes that Bob does not know that she wrote a novel under pseudonym”.

In the current era of hybrid intelligence, teams may consist of humans, robots and software agents. For better coordination, it would be beneficial if the computational members of the team could recursively reason about the minds of their human colleagues. While the usefulness of higher orders of theory of mind is apparent in many social interactions, empirical evidence so far suggests that people usually do not use this recursive ability spontaneously, even when doing so would be highly beneficial. In this lecture, we discuss some of our computational modelling research and empirical experiments. How do children develop second-order theory of mind? How do they learn to catch a liar? Can we entice adults to engage in higher-order theory of mind reasoning by letting them negotiate with computational agents? What’s logic got to do with reasoning about reasoning about reasoning? Do corvids have any theory of mind? And how about ChatGPT?

10:00-10:30 Session 18: Large Language Models for Easy Language

This sessions presents papers using large language models to generate easy German.

Chair:

Tanya Braun

Location: Lecture Hall H02

10:00

Raeesa Yousaf, Marina Walther and Michael Gertz

Accessible Language Simplification: Large Language Models for Generating Easy German

PRESENTER: Raeesa Yousaf

ABSTRACT. The growing application of Large Language Models (LLMs) in text simplification holds significant promise for improving accessibility for non-native speakers and individuals with learning or cognitive disabilities. We examine the effectiveness of LLMs in generating Easy German responses through a domain-agnostic Question-Answering (QA) framework, leveraging explicit simplification rules and tailored prompting strategies. Focused on health-related questions, the framework transforms complex medical information into accessible language to enhance health literacy. Automated readability and semantic alignment metrics are combined with human evaluations from non-native speakers (A1–B1 CEFR proficiency) and individuals with special needs. Results show that GPT-4 consistently outperforms open-source models like Llama and Mixtral, generating factually accurate, clear, and accessible responses, while the latter models often struggle with coherence. Though developed for Easy German, the domain-agnostic methodology can be adapted to any language with minimal prompt adjustments. For reproducibility and the original German content, please refer to our GitHub repository.

10:20

Thorben Schomacker, Burak Tinman, Chris Biemann and Marina Tropmann-Frick

LLMs for Easy Language Translation: A Case Study on German Public Authorities Web Pages

PRESENTER: Thorben Schomacker

ABSTRACT. This paper examines the use of Large Language Models (LLMs) for the intralingual translation of documents from standard German to German Easy Language (Leichte Sprache). We use open source models, from the Llama 3 family, with less than ten billion parameters. Additionally, we employ parameter-efficient fine-tuning (QLoRA) to adapt the LLMs to the requirements of Easy Language. For this purpose, we introduce a new data set (ELGEPA), which is a parallel corpus of governmental documents in standard German and Easy Language with additional metadata, obtained from all German federal states and their capitals. In our experiments, a fine-tuned Llama 3.1-8B-Instruct model achieved a SARI score of 41 and Flesch Reading Ease of 69. Outperforming GPT 4o and indicating that this type of model can deliver promising text quality in Easy Language.

10:30-11:00Coffee Break

11:00-12:30 Session 19: Machine Learning and its Applications

This session presents two papers on meta-features and the combination of synthetic and real-world data as well as papers on the application domains of passive acoustic monitoring, flood inundation, steel microstructures, and personal finance.

Chair:

Frieder Stolzenburg

Location: Lecture Hall H02

11:00	Martin Schumann Enhancing Semi-Supervised Learning with a Meta-Feature Based Safeguard System ABSTRACT. The selection of Meta-Features can strongly influence performance and accuracy of semi-supervised learning methods. In this paper, we analyze the role of different meta-features in enhancing the performance of semi-supervised learning models. We present experiments with benchmark data sets, which meta-features contribute most significantly to model accuracy and robustness. We propose an enhanced safeguard system for semi-supervised learning that leverages meta-features to predict the potential benefits of pseudo-labeling, with a focus on simultaneously reducing computational resource consumption and improving the overall performance of semi-supervised learning models. By determining when to train a second predictor, our system optimizes computational efficiency, thereby minimizing energy usage and the associated carbon footprint. Thus, this approach emphasizes the importance of developing resource-conscientious machine learning methodologies that contribute to the broader goal of sustainable technology.
11:20	Paul Wachter, Lukas Niehaus and Julius Schoening Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data PRESENTER: Paul Wachter ABSTRACT. Synthetic data has emerged as a cost-effective alternative to real-world data for training artificial neural networks. However, The disparity between synthetic and real-world data results in a domain gap, which can lead to poor performance and generalization of the trained artificial neural networks when applied to real-world scenarios. To bridge this gap, several strategies have been developed, which combine synthetic with real-world data, known as mixed training using hybrid datasets. Whilst these strategies have been shown to mitigate the domain gap, a systematic evaluation of their generalizability and robustness across a variety of tasks and architectures remains underexplored. To address this challenge, our study comprehensively analyzes two widely used mixing strategies, on three prevalent architectures and three distinct hybrid datasets. From these datasets, we sample subsets with varying synthetic-to-real-world data proportions to investigate the impact of synthetic and real components. The findings of this paper provide valuable insights into optimizing the use of synthetic data in the training process of any artificial neural network, contributing to enhance robustness and efficacy.
11:40	Hannes Kath, Thiago S. Gouvêa and Daniel Sonntag Intermediate-Task Transfer Learning for Bioacoustic Data PRESENTER: Hannes Kath ABSTRACT. Biodiversity loss is accelerating, necessitating up-to-date and reliable quantitative data for evidence-based biosphere management. Passive acoustic monitoring (PAM) has become a key technology for scalable wildlife monitoring. While PAM facilitates large-scale data collection, efficiently analyzing the vast amounts of recorded data remains a significant challenge. This work explores the use of transfer learning and systematically investigates state-of-the-art models, comparing fine-tuning these models with using frozen layer weights, and evaluating the application of intermediate-task transfer learning. In intermediate-task transfer learning, a model pre-trained on data less related to the target data is first re-trained on a larger dataset more closely related to the target, before being re-trained on the target data itself. Our results show that fine-tuning improves performance compared to using frozen model weights, and that intermediate-task transfer learning is only beneficial for models trained on data significantly different from PAM data. These findings pave the way for developing a real-world, efficient PAM data analysis tool.
12:00	Dinesh Krishna Natarajan, Marco Stricker, Rushan Mukherjee, Marcela Charfuelan, Marlon Nuske and Andreas Dengel Deep learning emulators for large-scale, high-resolution urban pluvial flood prediction ABSTRACT. Flood inundation simulations can help in planning preventive measures against flood damage, however conventional simulation methods can be computationally expensive and time-consuming. An alternative to speed up the calculations is to use data-driven emulators. In this work, we present two contributions: (1) the development of a large-scale, high-resolution flood dataset and (2) the development of deep learning (DL)-based emulators for flood prediction trained using our dataset. We show that in comparison to previous works, our emulators are able to generalize to previously unseen test locations and achieve comparable performance metrics in terms of RMSE. In comparison to a GPU-accelerated simulator, an inference time speed-up of approximately 1000 times is achieved using these emulators. The dataset and code will be made available open-access.
12:10	Marcel Wentzien, Jerome Ingber, Jörg Schlötterer and Dirk Schmalzried Comparing the visual quality of deep generative models for steel microstructures PRESENTER: Marcel Wentzien ABSTRACT. In the realm of AI-driven material sciences, generating highly realistic steel microstructure images represents an opportunity to address data scarcity and advancing data-driven diagnostic and predictive applications. While previous work achieved good performance values for generating scanning electron microscopy grayscale images of steel microstructures, research on colorful light optical microscopy (LOM) is limited. We investigate the capabilities of two deep generative models to generate highly realistic LOM steel microstructure images. We train a StyleGANv2 and the recently published FractalGen on a dataset of microstructural images from 30 different steels and conduct a small expert study to evaluate how realistic the generated images look. Because of visible grid-like patterns, the experts almost always correctly identified FractalGen generated images as synthetic, while StyleGANv2 generated images were mostly indistinguishable from real images to them. We also used the StyleGANv2’s discriminator to conduct the same task, resulting in not yet fully explainable classification results. Our findings are a first step towards the generation of realistic synthetic pairs of microstructures and the steels chemical composition and processing, which could reduce costs in future AI-driven material sciences research.

12:30-14:00Lunch Break

14:00-15:30 Session 20: Large Language Models & Closing

This session focusses on large language models, specifically set encodings to improve LLMs and using LLMs for learning linear functions, code completion, and in a human-AI framework for collaborative systematic literative review.

Chair:

Magnus Bender

Location: Lecture Hall H02

14:00	Lukas Kinder, Lukas Edman, Alexander Fraser and Tobias Käfer Positional Overload: Positional Debiasing and Context Window Extension for Large Language Models using Set Encoding PRESENTER: Lukas Kinder ABSTRACT. Large Language Models (LLMs) typically track the order of tokens using positional encoding, which causes the following problems: positional bias, where the model is influenced by an ordering within the prompt, and a fixed context window, as models struggle to generalize to positions beyond those encountered during training. To address these limitations, we developed a novel method called set encoding. This method allows multiple pieces of text to be encoded in the same position, thereby eliminating positional bias entirely. Another promising use case for set encoding is to increase the size of the input an LLM can handle. Our experiments demonstrate that set encoding allows an LLM to solve tasks with far more tokens than without set encoding. To our knowledge, set encoding is the first technique to effectively extend an LLM’s context window without requiring any additional training.
14:20	Omar Naim, Guilhem Fouilhé and Nicholas Asher Re-examining learning linear functions in context PRESENTER: Omar Naim ABSTRACT. We explore in-context learning (ICL), a popular paradigm for inference with Large Language Models (LLMs), in a controlled experimental setup using synthetic training data. Using a range of small transformer models trained from scratch, we focus on a mathematical task with simple but precise prompts, that of learning a linear function $f$ from a sequence of inputs $x_i$ and function values $f(x_i)$. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to in-context learn (ICL) a linear function. We observe that all models have ``boundary values'' that limit generalizability. While we can extend boundary values with training distributions over a wider range, we lose the precision of models trained on distributions with more restricted ranges. Thus, we see a dilemma for ICL at least in some tasks: either models will lack generalizability or precision.
14:40	Maximilian Noppel, Karl Rubel and Christian Wressnegger Exploiting Contexts of LLM-based Code-Completion PRESENTER: Maximilian Noppel ABSTRACT. Code assistants based on Large language models (LLMs) are built on massive datasets, often sourced from untrusted GitHub repositories. Adversaries can poison these sources so that the resulting models suggest insecure code. The straightforward approach—publishing vulnerable code—is typically insufficient, though, as datasets are commonly filtered for vulnerabilities using static analysis. However, recent attacks like TrojanPuzzle circumvent these filters by reusing tokens from distinctive patterns in a victim’s context. TrojanPuzzle has a crucial limitation, though. The distinctive pattern must include at least one token that appears in the desired vulnerable suggestion (the bait). Our attacks [10] lift this restriction by implanting a learnable mapping function, specifically parameterized to transform any token into the required bait token.
15:00	Constantin Brîncoveanu, K Valerie Carl, Aaron Witzki and Oliver Hinz Augmenting Systematic Literature Reviews in Information Systems: A Human-AI Collaborative Framework PRESENTER: Constantin Brîncoveanu ABSTRACT. While Systematic Literature Reviews (SLRs) are integral to research by synthesizing existing knowledge and guiding future inquiry, the exponential increase in academic publications presents significant challenges to traditional, manual review methods, notably regarding scalability, efficiency, and researcher workload. Recent advancements in Artificial Intelligence (AI), particularly Large Language Models (LLMs), offer promising avenues for augmenting the SLR process. Nonetheless, integrating AI into literature reviews introduces methodological complexities, including maintaining accuracy, minimizing biases, and preserving scholarly rigor. To address these challenges, this paper introduces a structured AI-augmented SLR framework, systematically integrating AI capabilities into Wolfswinkel et al.'s established Grounded Theory Literature Review Method. Our framework incorporates AI-driven relevance assessments, automated selection processes, and thematic content analysis, underpinned by rigorous human oversight to ensure reliability and interpretative validity. We empirically evaluate our framework through a comparative study, replicating and extending a previously published human SLR. The evaluation assesses AI performance using key metrics such as type I and type II error rates across varying confidence thresholds. Results demonstrate substantial efficiency gains and effective accuracy in AI-assisted selection, highlighting the importance of carefully calibrated thresholds and continued human oversight. Our study contributes practical guidelines for effectively balancing AI automation with human scholarly judgment, offering a replicable methodological approach for researchers seeking to leverage AI capabilities without compromising methodological quality or academic integrity.
15:20	Tanya Braun, Benjamin Paaßen and Frieder Stolzenburg KI 2025 - Closing PRESENTER: Benjamin Paaßen

15:30-16:30Farewell Coffee Break