KI 2025: 48TH GERMAN CONFERENCE ON ARTIFICIAL INTELLIGENCE
PROGRAM FOR THURSDAY, SEPTEMBER 18TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:30 Session 11: Vision, Explanations, and Privacy&Security

This session starts with vision and anonymisation, moving on to interpreting vision transformers and explanations for object detection, and then morphing into an explanation session with talks on explainable move selection for chess and a new attack technique against explanations, circling back to the privacy and security topic of the first talk.

09:00
Towards Systematic Evaluation of Computer Vision Models under Data Anonymization
PRESENTER: Sarah Weiß

ABSTRACT. Camera-based systems in everyday applications require careful handling of privacy-sensitive image data. A promising approach to prevent data misuse is anonymization right after perception but before processing. However, traditional anonymization methods such as blurring and pixelation remove information essential for subsequent processing algorithms. Realistic anonymization, in contrast, can preserve vital information by generation of naturalistic replacements. Nevertheless, these systems pose unresolved questions regarding information preservation and correctness. We develop a systematic approach to analyze anonymization methods and their effects on model training and performance. Through a quantitative analysis of various anonymization methods, the challenges and pitfalls introduced by anonymization are highlighted. The lack of datasets for anonymization research, forces the adaption of inapt data. We use the state-of-the-art toolbox DeepPrivacy2 to generate realistic full-body anonymized data from COCO data and evaluate object classification using YOLOv10. Additionally, we demonstrate that appropriate evaluation of anonymization techniques requires specialized datasets. To address this gap, we introduce a handcrafted dataset, enabling us to prove the need for dedicated anonymization datasets. Based on systematically selected metrics, we assess the impact of anonymization on images or classes and present a range of experiments focusing on factors as object size and co-occurrence frequency with the anonymized class. Furthermore, novel findings on robustness of different model sizes and processing of anonymized images are presented. Our findings guide future directions in model adaptation to anonymized data, highlight improvements necessary in realistic anonymization generation, and underscore the importance of dedicated anonymization datasets.

09:20
Visualizing and Interpreting Neural Network Focus Regions: A Comparative Study of Vision Transformers on Synthetic and Real Data
PRESENTER: Waldemar Haag

ABSTRACT. The quality and diversity of training datasets significantly influence the performance of neural networks, particularly in object detection tasks that rely on annotated images to learn from marked regions. While synthetic image data reduces the need for extensive manual annotation, the domain gap between synthetic and real-world data remains a challenge, often leading to reduced model performance on real-world images. This study compares the image regions considered important by Transformer networks for object detection on synthetic and real image data, analyzing the size, quantity, and spatial distribution of regions of high-attention using a feature visualization method. Additionally, the research explores the use of generative artificial intelligence techniques - specifically Stable Diffusion - to enhance the realism of synthetic images, aiming to improve feature transfer to real-world applications and potentially bridge the domain gap. Furthermore, an occlusion-based evaluation method is proposed to assess model behavior by selectively masking regions of high attention. Findings suggest a positive correlation between the number of large regions of high attention - specifically, those exceeding 1/16 of the bounding box area - used for object detection and model performance. Augmentation with Stable Diffusion appears to enhance the model's robustness to the occlusion of regions of high-attention in the images.

09:30
ODExAI: A Comprehensive Object Detection Explainable AI Evaluation

ABSTRACT. Explainable Artificial Intelligence (XAI) techniques for interpreting object detection models remain in an early stage, with no established standards for systematic evaluation. This absence of consensus hinders both the comparative analysis of methods and the informed selection of suitable approaches. To address this gap, we introduce the Object Detection Explainable AI Evaluation (ODExAI), a comprehensive framework designed to assess XAI methods in object detection based on three core dimensions: localization accuracy, faithfulness to model behavior, and computational complexity. We benchmark a set of XAI methods across two widely used object detectors (YOLOX and Faster R-CNN) and standard datasets (MS-COCO and PASCAL VOC). Empirical results demonstrate that region-based methods (e.g., D-CLOSE) achieve strong localization (PG = 88.49%) and high model faithfulness (OA = 0.863), though with substantial computational overhead (Time = 71.42s). On the other hand, CAM-based methods (e.g., G-CAME) achieve superior localization (PG = 96.13%) and significantly lower runtime (Time = 0.54s), but at the expense of reduced faithfulness (OA = 0.549). These findings demonstrate critical trade-offs among existing XAI approaches and reinforce the need for task-specific evaluation when deploying them in object detection pipelines. Our implementation and evaluation benchmarks are publicly available at: https://github.com/Analytics-Everywhere-Lab/odexai.

09:50
Caïssa AI: A Neuro-Symbolic Chess Agent for Explainable Move Suggestion and Grounded Commentary
PRESENTER: Nourhan Ehab

ABSTRACT. Despite the impressive generative capabilities of large language models (LLMs), their lack of grounded reasoning and susceptibility to hallucinations limit their reliability in structured domains such as chess. We present Caïssa AI, a neuro-symbolic chess agent that augments LLM-generated move commentary with symbolic reasoning, knowledge graph integration, and verification modules. Caïssa AI combines a fine-tuned chess-specific LLM with a Prolog-based rule engine encoding chess tactics and rules, along with a dynamically constructed Neo4j knowledge graph representing the current board state. This hybrid architecture enables the system to generate not only accurate move suggestions but also coherent, strategically grounded commentary. A LangGraph-based verification module cross-checks LLM outputs against symbolic logic to ensure consistency and correctness, effectively mitigating hallucinations. By aligning data-driven generation with formal domain knowledge, Caïssa AI enhances both trustworthiness and explainability. Our results demonstrate that this tight neuro-symbolic integration produces verifiable and high-quality chess commentary outperforming existing approaches.

10:10
Makrut Attacks Against Black-Box Explanations
PRESENTER: Achyut Hegde

ABSTRACT. Explanations have added great value to the field of Machine Learning (ML). However, existing methods for generating explanations are not without limitations. There exist multiple attack techniques that can manipulate the explanations. Most of these attacks are designed to target white-box explanations and cannot be applied to black box explanations out of the box. In a recent paper, we propose a novel attack technique, Makrut that attacks a popular black-box explanation method LIME. Using Makrut, models can be manipulated to generate arbitrary explanations while maintaining other metrics like accuracy. The feasibility of these attacks emphasizes the need for more trustworthy explanation methods.

10:30-11:00Coffee Break
11:00-12:30 Session 12: Explanations and Robustness

This session focusses on robustness (and resilience) in various forms, for feature selection, object detection, image classification, and vision-language models.

11:00
Unsupervised Selection of Features by their Resilience to the Curse of Dimensionality
PRESENTER: Tom Hanika

ABSTRACT. Real-world datasets are often of high dimensionality and affected by the curse of dimensionality. This hinders their comprehensibility and interpretability. To reduce the complexity, feature selection aims to identify features that are crucial to learn from said data. While measures of relevance and pairwise similarities are commonly used, the curse of dimensionality is rarely incorporated into the process of selecting features. Here we step in with a novel unsupervised method that identifies the features that allow to discriminate data subsets of different sizes. By adapting recent work on computing intrinsic dimensionalities, our method is able to select the features that can discriminate data and thus weaken the curse of dimensionality. Our experiments show that our method is competitive and commonly outperforms established unsupervised feature selection methods. Furthermore, we propose an approximation that allows our method to scale to datasets consisting of millions of data points. Our findings suggest that features that discriminate data and are connected to a low intrinsic dimensionality are meaningful for learning procedures.

11:20
XAIRob --- An Explainable-AI-Based Relative Robustness Measure for Object Detection

ABSTRACT. The detection of objects is fundamentally important for automating a system, e.g., an autonomous car. Robust object recognition is needed to navigate safely and accident-free through unknown situations. Detection must remain reliable even under unfavourable conditions. It is therefore essential to expose detection systems to real-world challenges and evaluate their performance under unknown scenarios. In this paper, we evaluate the robustness of object detection algorithms for autonomous vehicles using cameras or Light Detection and Ranging (LiDAR). Leveraging explainable artificial intelligence (x-AI), we identify key regions within objects that are crucial for the detection. By systematically removing critical pixels from images and points from point clouds, we investigate how much of an object can be removed before it becomes undetectable. The contribution of this work is a novel robustness evaluation method that allows a fair comparison of object recognition systems by integrating a faithful explainable artificial intelligence approach. This method provides a meaningful assessment of how resilient detection algorithms are if presented with missing or manipulated data, which is particularly relevant for ensuring safety in real-world autonomous applications. Our relative robustness scale works on different sensor modalities and reveals significant differences in the robustness of object detection algorithms. By focusing on the most critical parts of an object, we achieve a more precise comparison of detection performance than traditional approaches that rely mainly on random effects.

11:30
Re-Evaluating the Robustness and Interpretability of the Contrastive Explanation Method for Image Classification
PRESENTER: Luisa Schneider

ABSTRACT. Contrastive explanations are a way to make the predictive behavior of machine-learned classification models transparent and evalu- able based on example features at the decision boundary. An established method for explaining image classifiers is the Contrastive Explanations Method (CEM). This explanation method determines the most similar image of the opposite class for an input image and generates explanations based on pixel changes that are either necessary for the target class or that lead to a class change. The extent to which the similarity between classes influences the interpretability and robustness of CEM has not yet been investigated quantitatively and hypothesis-test-driven. However, a quantitative evaluation of explanatory methods is an important prerequi-site for assessing the usefulness of an approach. This paper therefore uses statistical methods to re-evaluate the robustness and interpretability of CEM. It is furthermore tested whether CEM reproduces human-edited instances in a meaningful way. The results show that CEM performs well in terms of robustness and interpretability for similar classes. For dissimilar classes, CEM generates more noise. The results support the usefulness of CEM in cases where there is an increased risk of confusion for classes.

11:40
On the Domain Robustness of Contrastive Vision-Language Models

ABSTRACT. In real-world vision-language applications, practitioners increasingly rely on large, pretrained foundation models instead of custom-built solutions, despite limited transparency regarding their training processes and datasets. While these models achieve impressive performance on general benchmarks, their effectiveness can decline notably under specialized domain shifts, such as unique imaging conditions or environmental variations. In this work, we introduce DeepBench, a framework designed to assess domain-specific robustness of vision-language models (VLMs). DeepBench utilizes a large language model (LLM) to generate realistic, context-aware image corruptions tailored to specific deployment domains without requiring labeled data. Evaluating several prominent contrastive vision-language architectures and their variants across six real-world domains, we demonstrate substantial variability in robustness, underscoring the importance of targeted evaluations. DeepBench is released as open-source software to support further research into domain-aware robustness assessment.

12:00
Toward Short and Robust Contrastive Explanations for Image Classification by Leveraging Instance Similarity and Concept Relevance

ABSTRACT. This work explores concept-based contrastive explanations for image classification. The goal is to understand why a model prefers one class over another by leveraging human-understandable concepts, and to assess the robustness of these explanations. Using a fine-tuned deep learning model, concepts with their relevance score are extracted and evaluated based on explanation length as a proxy for complexity. The dataset consists of images from the ILSVRC ImageNet subset, where instances are chosen for their semantic similarity. Two research questions are addressed: (1) whether explanation complexity varies across different relevance ranges, and (2) whether explanation complexity remains consistent under image augmentations such as rotation and noise. The results confirm that higher concept relevance leads to shorter, less complex explanations, while lower relevance results in longer, more diffuse explanations. Additionally, explanations show varying degrees of robustness. These findings offer valuable insights into the potential of contrastive concept-based approaches for building more robust and interpretable AI systems.

12:30-14:00Lunch Break
14:00-14:45 Session 13: Invited Talk 2 (together with Informatik Festival 2025)

This session is part of the Informatik Festival main program and is thematically linked with the following panel.

14:00
Magic wands and digital zombies: some promises and risks of AI for digital legacy
14:45-15:30 Session 14: Panel Discussion (together with Informatik Festival 2025)

What Norms Do We Want for AI?

AI is increasingly permeating all areas of life. This panel discussion focusses on AI and ethics, touching on changes and risks, rules and regulations, as well as the human in all of it.

15:30-16:30Coffee Break
16:30-17:00 Session 15: Special Session: On the Origins of AI in Germany

To celebrate the 50th anniversary of the first meeting that started the KI conference as it is known today, we present a special session on the origins of AI in Germany with a presentation by Rudolf Seising, Deutsches Museum Munich.

16:30
The origins of AI research in the Federal Republic of Germany

ABSTRACT. This lecture presents the results of the history of science project 'IGGI - In-genieur-Geist und Geistes-Ingenieure: A History of Artificial Intelligence in the Federal Republic of Germany', which is funded by the BMBF from 2019 to 2023.

17:00-18:30 Session 16: GI-FBKI General Assembly

The Artificial Intelligence Chapter (FBKI, in German: Fachbereich Künstliche Intelligenz) of the Gesellschaft für Informatik invites its members to its annual general assembly (in German: Mitgliederversammlung).

18:00-23:59 Junge GI Festival Night

The festival night offers retro-gaming, live music, a science slam, and many opportunities to talk to new people.

More details in German here: https://informatik2025.gi.de/abendveranstaltungen.html

Location: Siggi