KI 2025: 48TH GERMAN CONFERENCE ON ARTIFICIAL INTELLIGENCE
PROGRAM FOR FRIDAY, SEPTEMBER 19TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:00 Session 17: Invited Talk 3

The third invited talk opens the last day of the conference.

09:00
TBA
10:00-10:30 Session 18: Large Language Models for Easy Language

This sessions presents papers using large language models to generate easy German.

10:00
Accessible Language Simplification: Large Language Models for Generating Easy German
PRESENTER: Raeesa Yousaf

ABSTRACT. The growing application of Large Language Models (LLMs) in text simplification holds significant promise for improving accessibility for non-native speakers and individuals with learning or cognitive disabilities. We examine the effectiveness of LLMs in generating Easy German responses through a domain-agnostic Question-Answering (QA) framework, leveraging explicit simplification rules and tailored prompting strategies. Focused on health-related questions, the framework transforms complex medical information into accessible language to enhance health literacy. Automated readability and semantic alignment metrics are combined with human evaluations from non-native speakers (A1–B1 CEFR proficiency) and individuals with special needs. Results show that GPT-4 consistently outperforms open-source models like Llama and Mixtral, generating factually accurate, clear, and accessible responses, while the latter models often struggle with coherence. Though developed for Easy German, the domain-agnostic methodology can be adapted to any language with minimal prompt adjustments. For reproducibility and the original German content, please refer to our GitHub repository.

10:20
LLMs for Easy Language Translation: A Case Study on German Public Authorities Web Pages

ABSTRACT. This paper examines the use of Large Language Models (LLMs) for the intralingual translation of documents from standard German to German Easy Language (Leichte Sprache). We use open source models, from the Llama 3 family, with less than ten billion parameters. Additionally, we employ parameter-efficient fine-tuning (QLoRA) to adapt the LLMs to the requirements of Easy Language. For this purpose, we introduce a new data set (ELGEPA), which is a parallel corpus of governmental documents in standard German and Easy Language with additional metadata, obtained from all German federal states and their capitals. In our experiments, a fine-tuned Llama 3.1-8B-Instruct model achieved a SARI score of 41 and Flesch Reading Ease of 69. Outperforming GPT 4o and indicating that this type of model can deliver promising text quality in Easy Language.

10:30-11:00Coffee Break
11:00-12:30 Session 19: Machine Learning and its Applications

This session presents two papers on meta-features and the combination of synthetic and real-world data as well as papers on the application domains of passive acoustic monitoring, flood inundation, steel microstructures, and personal finance.

11:00
Enhancing Semi-Supervised Learning with a Meta-Feature Based Safeguard System

ABSTRACT. The selection of Meta-Features can strongly influence performance and accuracy of semi-supervised learning methods. In this paper, we analyze the role of different meta-features in enhancing the performance of semi-supervised learning models. We present experiments with benchmark data sets, which meta-features contribute most significantly to model accuracy and robustness.

We propose an enhanced safeguard system for semi-supervised learning that leverages meta-features to predict the potential benefits of pseudo-labeling, with a focus on simultaneously reducing computational resource consumption and improving the overall performance of semi-supervised learning models. By determining when to train a second predictor, our system optimizes computational efficiency, thereby minimizing energy usage and the associated carbon footprint. Thus, this approach emphasizes the importance of developing resource-conscientious machine learning methodologies that contribute to the broader goal of sustainable technology.

11:20
Development of Hybrid Artificial Intelligence Training on Real and Synthetic Data
PRESENTER: Paul Wachter

ABSTRACT. Synthetic data has emerged as a cost-effective alternative to real-world data for training artificial neural networks. However, The disparity between synthetic and real-world data results in a domain gap, which can lead to poor performance and generalization of the trained artificial neural networks when applied to real-world scenarios. To bridge this gap, several strategies have been developed, which combine synthetic with real-world data, known as mixed training using hybrid datasets. Whilst these strategies have been shown to mitigate the domain gap, a systematic evaluation of their generalizability and robustness across a variety of tasks and architectures remains underexplored. To address this challenge, our study comprehensively analyzes two widely used mixing strategies, on three prevalent architectures and three distinct hybrid datasets. From these datasets, we sample subsets with varying synthetic-to-real-world data proportions to investigate the impact of synthetic and real components. The findings of this paper provide valuable insights into optimizing the use of synthetic data in the training process of any artificial neural network, contributing to enhance robustness and efficacy.

11:40
Intermediate-Task Transfer Learning for Bioacoustic Data
PRESENTER: Hannes Kath

ABSTRACT. Biodiversity loss is accelerating, necessitating up-to-date and reliable quantitative data for evidence-based biosphere management. Passive acoustic monitoring (PAM) has become a key technology for scalable wildlife monitoring. While PAM facilitates large-scale data collection, efficiently analyzing the vast amounts of recorded data remains a significant challenge. This work explores the use of transfer learning and systematically investigates state-of-the-art models, comparing fine-tuning these models with using frozen layer weights, and evaluating the application of intermediate-task transfer learning. In intermediate-task transfer learning, a model pre-trained on data less related to the target data is first re-trained on a larger dataset more closely related to the target, before being re-trained on the target data itself. Our results show that fine-tuning improves performance compared to using frozen model weights, and that intermediate-task transfer learning is only beneficial for models trained on data significantly different from PAM data. These findings pave the way for developing a real-world, efficient PAM data analysis tool.

12:00
Deep learning emulators for large-scale, high-resolution urban pluvial flood prediction

ABSTRACT. Flood inundation simulations can help in planning preventive measures against flood damage, however conventional simulation methods can be computationally expensive and time-consuming. An alternative to speed up the calculations is to use data-driven emulators. In this work, we present two contributions: (1) the development of a large-scale, high-resolution flood dataset and (2) the development of deep learning (DL)-based emulators for flood prediction trained using our dataset. We show that in comparison to previous works, our emulators are able to generalize to previously unseen test locations and achieve comparable performance metrics in terms of RMSE. In comparison to a GPU-accelerated simulator, an inference time speed-up of approximately 1000 times is achieved using these emulators. The dataset and code will be made available open-access.

12:10
Comparing the visual quality of deep generative models for steel microstructures
PRESENTER: Marcel Wentzien

ABSTRACT. In the realm of AI-driven material sciences, generating highly realistic steel microstructure images represents an opportunity to address data scarcity and advancing data-driven diagnostic and predictive applications. While previous work achieved good performance values for generating scanning electron microscopy grayscale images of steel microstructures, research on colorful light optical microscopy (LOM) is limited. We investigate the capabilities of two deep generative models to generate highly realistic LOM steel microstructure images. We train a StyleGANv2 and the recently published FractalGen on a dataset of microstructural images from 30 different steels and conduct a small expert study to evaluate how realistic the generated images look. Because of visible grid-like patterns, the experts almost always correctly identified FractalGen generated images as synthetic, while StyleGANv2 generated images were mostly indistinguishable from real images to them. We also used the StyleGANv2’s discriminator to conduct the same task, resulting in not yet fully explainable classification results. Our findings are a first step towards the generation of realistic synthetic pairs of microstructures and the steels chemical composition and processing, which could reduce costs in future AI-driven material sciences research.

12:20
Learn, Optimize, Explain: A Neuro-Symbolic Advisor for Personal Finance
PRESENTER: Nourhan Ehab

ABSTRACT. Investment decision-making in personal finance requires both predictive accuracy and the ability to reason with domain-specific knowledge. Traditional robo-advisors and decision support systems often rely solely on statistical models, limiting their capacity to capture the nuanced behavior of financial markets and to incorporate expert insights. This paper proposes \emph{LOX}, a neuro-symbolic decision support framework that combines the learning capabilities of neural networks with the interpretability of symbolic reasoning to generate personalized and explainable investment recommendations. The system integrates Long Short-Term Memory (LSTM) networks for market trend prediction with an ontology-based knowledge representation that encodes investment principles and asset-specific metrics. A key contribution lies in the fusion of these components into a unified decision-making pipeline that not only optimizes portfolios based on forecasted returns and risk, but also justifies the recommendations with symbolic reasoning grounded in expert knowledge. Experimental results demonstrate that this hybrid approach enhances both performance and trustworthiness, bridging the gap between data-driven learning and rule-based financial expertise.

12:30-14:00Lunch Break
14:00-15:30 Session 20: Large Language Models & Closing

This session focusses on large language models, specifically set encodings to improve LLMs and using LLMs for learning linear functions, code completion, and in a human-AI framework for collaborative systematic literative review.

14:00
Positional Overload: Positional Debiasing and Context Window Extension for Large Language Models using Set Encoding
PRESENTER: Lukas Kinder

ABSTRACT. Large Language Models (LLMs) typically track the order of tokens using positional encoding, which causes the following problems: positional bias, where the model is influenced by an ordering within the prompt, and a fixed context window, as models struggle to generalize to positions beyond those encountered during training. To address these limitations, we developed a novel method called set encoding. This method allows multiple pieces of text to be encoded in the same position, thereby eliminating positional bias entirely. Another promising use case for set encoding is to increase the size of the input an LLM can handle. Our experiments demonstrate that set encoding allows an LLM to solve tasks with far more tokens than without set encoding. To our knowledge, set encoding is the first technique to effectively extend an LLM’s context window without requiring any additional training.

14:20
Re-examining learning linear functions in context
PRESENTER: Omar Naim

ABSTRACT. We explore in-context learning (ICL), a popular paradigm for inference with Large Language Models (LLMs), in a controlled experimental setup using synthetic training data. Using a range of small transformer models trained from scratch, we focus on a mathematical task with simple but precise prompts, that of learning a linear function $f$ from a sequence of inputs $x_i$ and function values $f(x_i)$. Our findings challenge the prevailing narrative that transformers adopt algorithmic approaches like linear regression to in-context learn (ICL) a linear function. We observe that all models have ``boundary values'' that limit generalizability. While we can extend boundary values with training distributions over a wider range, we lose the precision of models trained on distributions with more restricted ranges. Thus, we see a dilemma for ICL at least in some tasks: either models will lack generalizability or precision.

14:40
Exploiting Contexts of LLM-based Code-Completion

ABSTRACT. Code assistants based on Large language models (LLMs) are built on massive datasets, often sourced from untrusted GitHub repositories. Adversaries can poison these sources so that the resulting models suggest insecure code. The straightforward approach—publishing vulnerable code—is typically insufficient, though, as datasets are commonly filtered for vulnerabilities using static analysis. However, recent attacks like TrojanPuzzle circumvent these filters by reusing tokens from distinctive patterns in a victim’s context. TrojanPuzzle has a crucial limitation, though. The distinctive pattern must include at least one token that appears in the desired vulnerable suggestion (the bait). Our attacks [10] lift this restriction by implanting a learnable mapping function, specifically parameterized to transform any token into the required bait token.

15:00
Augmenting Systematic Literature Reviews in Information Systems: A Human-AI Collaborative Framework

ABSTRACT. While Systematic Literature Reviews (SLRs) are integral to research by synthesizing existing knowledge and guiding future inquiry, the exponential increase in academic publications presents significant challenges to traditional, manual review methods, notably regarding scalability, efficiency, and researcher workload. Recent advancements in Artificial Intelligence (AI), particularly Large Language Models (LLMs), offer promising avenues for augmenting the SLR process. Nonetheless, integrating AI into literature reviews introduces methodological complexities, including maintaining accuracy, minimizing biases, and preserving scholarly rigor. To address these challenges, this paper introduces a structured AI-augmented SLR framework, systematically integrating AI capabilities into Wolfswinkel et al.'s established Grounded Theory Literature Review Method. Our framework incorporates AI-driven relevance assessments, automated selection processes, and thematic content analysis, underpinned by rigorous human oversight to ensure reliability and interpretative validity. We empirically evaluate our framework through a comparative study, replicating and extending a previously published human SLR. The evaluation assesses AI performance using key metrics such as type I and type II error rates across varying confidence thresholds. Results demonstrate substantial efficiency gains and effective accuracy in AI-assisted selection, highlighting the importance of carefully calibrated thresholds and continued human oversight. Our study contributes practical guidelines for effectively balancing AI automation with human scholarly judgment, offering a replicable methodological approach for researchers seeking to leverage AI capabilities without compromising methodological quality or academic integrity.

15:20
KI 2025 - Closing
PRESENTER: Tanya Braun
15:30-16:30Coffee Break