The doctoral consortium aims to offer early stage researchers from any AI subject area a unique opportunity to present their planned research, and to connect to fellow PhD students as well as senior researchers in AI.
AI-Driven Chest Radiography Report Generation: Integrating LLMs, CLIP, Tree-of-Thoughts, Multimodal Retrieval-Augmented Generation, Classification and Direct Preference Optimization
ABSTRACT. Manual chest radiography report creation is time-consuming and prone to variability, increasing radiologist workload and potentially affecting diagnostic consistency. This research introduces a novel, integrated AI system to automate chest radiology report generation. The proposed system leverages Large Language Models (LLMs) elevated by several key components. A trained classifier provides pathology probabilities from the input image for guiding the LLMs and a Contrastive Language-Image Pre-training (CLIP) model to establish a shared embedding space for efficient multimodal retrieval. Following a multimodal Retrieval-Augmented Generation (RAG) approach to retrieve the relevant prior image-report pairs, to improve factual grounding and contextual understanding. The Tree-of-Thoughts (ToT) framework enhances the diversity of report generation while maintaining clinical validity. Finally, Direct Preference Optimization (DPO) refines the LLM model using automatically generated preference data based on clinical efficacy, including CheXbert F1 score, and cosine similarity to calculate embedding similarity. This comprehensive approach aims to improve the clinical accuracy, coherence, and overall quality of generated radiology reports compared to existing methods, addressing limitations such as hallucinations and lack of specificity in current automated systems.
Analyzing Deep Generative Models for Steel Microstructures
ABSTRACT. Deep generative models compose synthetic, yet realistic images of different visual concepts by applying learned building rules. The set of all visual concepts and their relationship is of great importance for the materials sciences, as it allows to automatically characterize and objectively quantify a material's microstructural images. A material's microstructure is believed to encode all information of a material's chemical composition and processing (i.e. heat treatment) and helps to predict the material's mechanical properties. Currently the microstructure characterization requires expert annotation, mainly done manually on a case-by-case basis. Thus, an objective and automatic characterization allows to improve the understanding on how a combination of visual concepts relate to the material's processing and mechanical properties. This research proposal hypothesizes that deep generative models for steel microstructure images learn visual concepts that correspond to the visual signatures of underlying physical processes. First, deep generative models were trained on real steel microstructure images and investigated by domain experts. The results of the expert study indicates, that synthetic images by a StyleGANv2-Ada look realistic. Next, various approaches will be investigated on how to extract the set of visual concepts and building rules from the StyleGANv2. In a final step the visual concepts and building rules will be correlated to the material's physical processing.
Economically-Driven AI Process for Quality Assurance: Analysis in Optics Manufacturing
ABSTRACT. This research proposes the development of an economically driven, AI-based quality assurance (QA) process for optical manufacturing, with a specific focus on detecting and mitigating of subsurface damage (SSD) during multi-stage grinding. The approach combines optical coherence tomography (OCT) as a non-destructive imaging method, machine learning for defect detection, and explainable AI (XAI) to ensure transparency and trust. A volumetric segmentation model will be developed to enable automated SSD detection and classification from OCT data. Additionally, the project will integrate quality-relevant pro-cess parameters and economic evaluation models into a digital platform to support adaptive, cost-efficient manufacturing control. The goal is to improve efficiency, reduce costs, and enhance product quality, while advancing intelligent, explainable, and economically sustainable QA in high-precision manufacturing.
The doctoral consortium aims to offer early stage researchers from any AI subject area a unique opportunity to present their planned research, and to connect to fellow PhD students as well as senior researchers in AI.
Explainable Artificial Intelligence for Multivariate Sensor Data: Towards Transparency and Correctness in Model Explanations
ABSTRACT. The continuous growth in sensor technologies has led to the generation of a massive amount of time series sensor data across various sectors such as healthcare, industry, transportation, and smart home. Artificial intelligence (AI), particularly deep learning, provide a powerful tool to analyze this high-dimensional data automatically and thereby revealing the full potential of this valuable data source. However, AI models often are opaque in their decision-making. To enhance the transparency of AI, the research field of eXplainable AI (XAI) has gained significant attention in recent years. The majority of methods has been developed for image classification and do not consider time series properties such as seasonality and trend. In addition, the field lacks a common agreement on the quantitative evaluation of XAI explanations, e.g., regarding their correctness. The research question of this proposal is how XAI methods can enhance the transparency and correctness of model explanations in the analysis of multivariate sensor data. The aim of this research is threefold: First, we review XAI evaluation methods regarding their suitability for multivariate time series classification. Second, we systematically compare existing XAI methods on the resulting evaluation methods. This comparison will reveal their strengths and help to develop or improve XAI methods that are suitable for multivariate sensor data, which is our third step.
Learning Interpretable Disentangled Concepts for Neurosymbolic Integration
ABSTRACT. We propose a hierarchical Disentangled Representation Learning (DRL) framework that categorizes concepts into primitive (e.g. color) and higher-order relational types. Our first approach combines a pre-trained backbone with a specialized beta-VAE for concept disentanglement, enabling the separation of statistically independent concepts into interpretable latent factors. Building on this, we integrate Predicate Generation and Inductive Logic Programming (ILP) to map these factors into symbolic, human-understandable semantics. The ultimate goal of our framework is to bridge the gap between disentangled representations and human interpretability, aligning learned concepts with intuitive, semantic meanings to facilitate explainable AI. We validate our initial framework on the DSprites and CLEVR datasets, demonstrating its ability to hierarchically disentangle and symbolically ground concepts while advancing toward interpretable machine learning
Promoting Flatness of Representation Manifolds to Improve Deep Network Training
ABSTRACT. Modern deep learning achieves remarkable results on various
tasks. However, it requires large amounts of labeled data and significant
electrical energy for training. We aim to enhance training efficiency by
leveraging the manifold hypothesis. By explicitly penalizing the curviness
of manifolds in neural network representations, we seek to accelerate con-
vergence during training. This approach may also yield better objectives
for unsupervised representation learning. By creating well-fitted foundation models and training small networks for downstream tasks, we could
reduce the amount of labeled data needed for strong performance. These
results could make deep learning more accessible.
Efficient Graph-Based Neural Architectures for Multimodal Learning
ABSTRACT. The growing ubiquity of multimodal data ranging from time
series and textual inputs to images, videos, and event logs necessitates
unified learning frameworks that can reason across heterogeneous modalities
without relying on modality-specific architectures. Existing models
often struggle to jointly represent structured and unstructured, static
and temporal signals due to rigid assumptions about modality geometry,
alignment, and connectivity. This research proposes a graph-centric framework
for multimodal learning, wherein each data token is abstracted as a
node within a dynamically constructed heterogeneous graph. Inter- and
intra-modality relationships are encoded through learned graph topologies
and adaptive attention mechanisms that operate across both semantic
and temporal dimensions. By extending transformer architectures to
operate over these sparse or dense graph structures, the approach seeks
to retain cross-modal dependencies while scaling to large, asynchronous
data streams. The proposed methodology explores core innovations in
modality-aware tokenization and embedding strategies, attention mechanisms
adapted to heterogeneous graph sparsity, and fusion techniques that
unify token representations across diverse modalities. Special emphasis is
placed on modeling temporal phenomena, such as dynamic correlations,
lag effects, and alignment, within a graph-based transformer backbone,
allowing for joint inference over signals with varying structure and temporal
granularity. The research aims to answer fundamental questions
around graph topology design, efficient cross-modal fusion, and scalable
attention over multimodal data, with the ultimate goal of enabling generalizable,
interpretable, and computationally efficient neural architectures
for multimodal graphs.
The doctoral consortium aims to offer early stage researchers from any AI subject area a unique opportunity to present their planned research, and to connect to fellow PhD students as well as senior researchers in AI.
Model Efficiency Techniques in Multimodal Learning
ABSTRACT. In the increasingly complex domain of deep learning architectures, multimodal models encounter significant efficiency constraints arising from their substantial parameter complexity and computational requirements. This ongoing doctoral research investigates the fundamental limitation: the quadratic computational complexity of attention mechanisms in cross-modal contexts. We aim to develop a comprehensive theoretical framework addressing efficiency paradigms through strategic initialization techniques, sparse attention factorization, and progressive capacity scaling methodologies. Our investigative approach will integrate principles from the Lottery Ticket hypothesis to identify optimal substructures, employ knowledge distillation to transfer capabilities to more compact architectures, and implement model-centric curriculum learning to balance computational efficiency with representational power. By focusing on multimodal applications with image, time series, video-audio, and textual data, this research seeks to contribute to both the theoretical understanding of efficiency-performance tradeoffs and the practical methodologies for deploying sophisticated models in resource-constrained environments. The work intends to establish principled foundations for accessible multimodal learning while maintaining competitive performance benchmarks.
ABSTRACT. The training of current large neural networks requires huge amounts of data, which is an immense burden in terms of time, money, and computational resources. In multimodal use cases, the amount and complexity of data are often even further increased. This doctoral consortium proposal outlines the planned research to develop data-efficient multimodal training strategies as part of the research project "Enhancing Data and Model Efficiency in Multimodal Learning". By developing training strategies to reduce the amount of data and computational power required for model training, we aim to overcome the hurdle of limited available resources and make the exceptional capabilities of current large networks more accessible. We will investigate coreset subset selection strategies and data-centric curriculum learning as approaches to reduce the amount of data and computational power while still maintaining a high quality. We will also examine methods suitable for extracting meaningful representations from multimodal data in scenarios where labeled data is scarce, as labeling large amounts of data is often very costly and complex. The methods we intend to develop will be tested with a variety of public datasets to ensure usability for a broad field of use cases.
The doctoral consortium aims to offer early stage researchers from any AI subject area a unique opportunity to present their planned research, and to connect to fellow PhD students as well as senior researchers in AI.
Privacy Risk Assessment in Federated Learning: Extracting and Protecting Sensitive Information from Vision Language Models in Manufacturing Applications
ABSTRACT. Federated learning enables collaborative machine learning while preserving data privacy by keeping data local. However, shared model parameters still pose privacy risks, notably through the memorization effect, where models may unintentionally expose sensitive training data. This research evaluates such risks for manufacturing data, particularly technical drawings used in visual manufacturability assessments. It compares the performance of public foundation models with those fine-tuned on federated data and explores attacks targeting model weights to extract private information. Mitigation strategies will be assessed for their effectiveness and impact on model performance. Ultimately, the project aims to develop a decision-making framework that balances model utility and privacy in federated learning.
ABSTRACT. Formal verification of neural networks is essential for their safe deployment in critical domains such as autonomous driving. However, current verification techniques struggle to scale to deep networks and rely on symbolic inputs, which is why use in computer vision is challenging. In parallel, explainable AI (XAI) aims to understand their decision making, especially in image recognition. This research proposal describes how explainability techniques can be exploited to formally verify neural networks against symbolic constraints. The core contribution is the formulation of the concept-based verification problem and some approaches are proposed to solve it within the scope of a PhD thesis. The proposed approaches include local searches for counterexamples using adversarial attack or global verification strategies. Furthermore, realism constraints on images are also explored to reduce the verification search space in computer vision contexts.
Research Proposal: Runtime Verification on Spatial Objects
ABSTRACT. Runtime verification with linear temporal logic (LTL) is an established technique for verifying systems against temporal specifications. However, the world often involves physical objects in addition to time. This research proposal outlines my ideas for performing runtime verification on combined temporal and spatial logics where objects are modeled as continuous sweeps in space. Formulas should be allowed both use of LTL-semantics and operators for spatial intersection and distance. Although expressive spatio-temporal logics can easily lead to undecidable model checking, potential mitigating approaches could involve discounting or adding growing spatial uncertainty to prevent unlimited knowledge of the future. Further extensions could include timed events, which allow for the processing of unsynchronized measurements of object positions, as well as quantitative or multi-valued semantics. To be useful in practice, it is important to determine which acceleration structures are necessary for efficient query answering and what form a practical implementation of such a system could take. Interesting applications involve moving objects for which there is limited knowledge, yet continuous monitoring is required.
Together with the co-located conferences, the day ends in the welcome reception, starting at 6pm in the rooms of the Hasso-Plattner-Institute. The evening includes the award ceremony for the GI junior fellows and the Balzert prize.