IEEE CBMS 2026: THE 39TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS
PROGRAM FOR FRIDAY, JUNE 5TH
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:30 Session 13A: Interpretable and Multimodal AI for Clinical Imaging and Predictive Medicine
Location: Panorama
09:00
FedTwin‑XAI: A Patient‑Owned Federated Digital Twin Framework with Explainable Mobile Medical Imaging and Differentially Private Synthetic Augmentation

ABSTRACT. Medical imaging applications are increasingly deployed on mobile devices, yet large‑scale learning is constrained by privacy regulation, data silos, and class imbalance in real‑world collections. In parallel, patient‑centric digital twin architectures aim to enable personalized “what‑if” simulations but are rarely integrated with image‑based screening pipelines and explainability mechanisms. This paper introduces FedTwin‑XAI, a unified patient‑owned framework that integrates: (i) mobile imaging‑based screening with post‑hoc explainability, (ii) decentralized personal data pods for data sovereignty, (iii) federated learning for collaborative training without centralizing raw data, (iv) differential privacy‑aware synthetic augmentation to mitigate scarcity and imbalance, and (v) per‑patient digital twins for longitudinal monitoring and scenario simulation. The framework is positioned for privacy-by-design deployment and aligns with federated and explainable machine vision. We instantiate the imaging component using a VGG16‑based scalp disease classifier with SHAP explanations and report component-level quantitative performance on a 10‑class dataset (13,196 images), achieving 99.84\% accuracy on a labeled validation set on the aggregator node. We then provide a protocol to evaluate the end‑to‑end federated twin pipeline under non‑IID client partitions, including communication–accuracy trade‑offs, calibration, and privacy accounting. The result is an actionable blueprint for privacy‑preserving, explainable medical imaging systems that can be embedded into patient‑centric digital twins for intelligent healthcare.

09:10
Explaining Visual-Language Foundation Models for Histopathology: a Patch-Level Approach

ABSTRACT. Visual–language foundation models have recently become the state of the art in computational histopathology, enabling zero-shot classification and region-level interpretation via text–image similarity. However, it remains unclear whether these models rely on features that are semantically meaningful to human experts at the tile/patch level. In this work, we assess the alignment between model-derived saliency maps and specialist annotations for three visual-language foundation models: CONCH, PathGen, and MUSK. Using the model-agnostic P-IBISA method, we generate attribution maps for histopathology patches from the WSSS4LUAD and BCSS datasets and compare them to ground-truth semantic segmentation masks. Faithfulness is measured using the Confidence Increase metric, while spatial correspondence is evaluated via the DICE score. Results show that P-IBISA saliencies consistently achieve higher faithfulness than ground-truth annotations, indicating that the highlighted regions are more influential to the models’ predictions than human-labeled regions. Additionally, localization analysis reveals low overlap between saliency maps and expert annotations, suggesting that the models rely on features that do not fully correspond to human-interpretable tissue regions. These findings highlight a gap between model reasoning and human understanding, motivating future work toward integrating segmentation-aware regularization into multimodal foundation models for histopathology.

09:20
Specializing Large Language Models for Hierarchy-Aware ICD-10 Mapping of Portuguese Cardiology Diagnoses

ABSTRACT. Automated ICD-10 coding is a high-impact yet expertise-intensive task requiring precise hierarchical reasoning. We evaluate whether a specialized LLM can approach cardiology specialist-level performance when mapping short Portuguese diagnoses to ICD-10 codes. We introduce (i) a double-specialist benchmark of 381 diagnoses from 89 clinical texts and (ii) a 14,685-pair supervision diagnoses corpus generated by a teacher LLM with structural validation and hierarchical normalization. Across paradigms, Block+Category accuracy improves from 0.5350 (retrieval baseline) to 0.7366 (expanded retrieval), 0.7815 (open frontier model), and 0.9019 (proprietary frontier model). Most residual errors reflect hierarchical near-misses rather than semantic misclassification. Supervised fine-tuning of mid-scale open models achieves 0.8582 accuracy on a stratified test set, approaching frontier performance. Results indicate that ICD-10 coding depends more on hierarchical calibration and domainspecific supervision than model scale alone, supporting compact, clinically deployable assistants for Portuguese cardiology.

09:30
Understanding Chest X-ray Vision Representations with DeViL

ABSTRACT. Explainability is essential for the reliable deployment of deep learning models in clinical settings, with most research focusing on post-hoc visual saliency methods. However, these rely on task-specific classifiers, limiting their applicability for analysing vision encoders independently of downstream tasks. Moreover, these methods can only explain the predefined set of classes for which the underlying model was trained. In this work, we investigate how DeViL, a framework that translates visual features into natural language, can be leveraged for understanding representations learned by chest X-ray vision models. DeViL requires no task-specific heads, since it only uses the frozen vision encoder, and it is able to generate open-vocabulary saliency maps because it uses a language model. We adapt DeViL to align visual features with clinically meaningful concepts by training on structured radiology reports and using a radiology-specialised language model. We conduct experiments on structured radiology report generation, saliency generation, and open-vocabulary saliency-text grounding on different types of chest X-ray vision encoders: convolutional, self-supervised Vision Transformer, and vision–language Transformer. Results show competitive performance with large end-to-end report generation models and demonstrate that DeViL's open-vocabulary saliency maps outperform those produced by a specialised saliency generation method for vision-language encoders.

09:40
Interpretable Hybrid Modeling for Breast Cancer Risk Stratification from Structured Radiology Reports

ABSTRACT. Breast cancer risk stratification based on radiological reports is challenging, especially in non-English-speaking clinical settings. This study proposes a two-stage hybrid framework that combines automatic information extraction using a large language model (LLM) with structured predictive modeling to estimate malignancy risk. A private set of 40,394 anonymized reports in Spanish (2014–2019) was used. The extraction module was based on an LLM adjusted using Low-Rank Adaptation (LoRA) to transform unstructured clinical text into structured representations compatible with a hierarchical BI-RADS scheme. The model achieved an Exact Match of 0.997 and micro-F1 values greater than 0.98 in clinically critical fields. The extracted entities were transformed into tabular variables for subsequent modeling. The binary risk classifier achieved a ROC-AUC of 0.953. In the high-risk subset, the cancer prediction model achieved a ROC-AUC of 0.802. The proposed modular architecture preserves interpretability and clinical traceability, demonstrating the feasibility of integrating LLMs into real-world risk-stratification workflows.

09:50
Multimodal Multitask Neural ODEs for Continuous-Time Alzheimer's Disease Progression Forecasting

ABSTRACT. Alzheimer's Disease (AD) represents a growing global health crisis characterized by progressive cognitive decline, memory impairment, and irreversible brain atrophy. The heterogeneous and irregular nature of longitudinal clinical data presents significant challenges for accurate disease progression modeling, with existing methods struggling with irregular sampling, effective multimodal integration, and simultaneous prediction of multiple clinical outcomes. To address these limitations, we propose an enhanced Neural Ordinary Differential Equations (Neural ODEs) framework that leverages continuous-time dynamics with fourth-order Runge-Kutta (RK4) integration to model AD progression from irregular longitudinal observations. Our approach incorporates a dual-attention multimodal fusion mechanism for task-specific feature weighting and an adaptive multi-task learning strategy with dynamic task balancing. Evaluated on the OASIS-2 dataset, a comprehensive ablation study across six modality-task configurations shows that our multimodal multi-task framework achieves the best overall performance, with the complete model obtaining a mean diagnosis AUC of 0.842, accuracy of 0.744, MMSE R‑squared of 0.575, CDR R‑squared of 0.138, and atrophy R‑squared of 0.510. Our framework outperforms single-modality baselines with robust convergence and interpretable features, advancing integrated AD progression modeling.

10:00
Deep Reinforcement Learning training for exoskeleton Exo-H3 using ROS

ABSTRACT. Lower limb exoskeletons show great potential for rehabilitation of patients with different motor impairments. However, robot neurorehabilitation is still far from being widely used in clinical practice given human-robot interaction limitations. Therefore, the rise of Reinforcement Learning offers an opportunity to tackle the complex problem of human-robot interaction, often avoided by classical position control with a trajectory reference. However, the challenge of transferring the agent from the native programming environment where it has been trained to the real world for inference presents several limitations. This paper proposes a method for externalizing the training environment through ROS, to face the limitations and challenges of an uncoupled environment since the beginning. To prove this concept, a hyper-realist simulator of the Exo-H3 exoskeleton is controlled by the Reinforcement Learning agent. All experiments were conducted using the Exo-H3 Gazebo simulator connected via ROS. Real hardware validation remains future work.

10:10
Incomplete information retrieval without imputation: exploiting the correlation among deep features

ABSTRACT. Similarity-based retrieval over complex data often relies on deep embeddings derived from images, signals, or textual reports. However, in real-world datasets many records contain missing attributes, which makes similarity comparisons difficult. Traditional solutions either discard incomplete records or impute values, both of which may distord the latent representation space and introduce artificial information. In this work, we propose CURIE for retrieving similar records without imputing or deleting data. CURIE models each attribute as a distance space induced by deep embeddings, and estimates correlations between these spaces. During similarity computation, the contribution of missing attributes is redistributed to correlated ones using a weighting mechanism. Experiments across three image-based datasets using multiple deep feature extractors show that CURIE consistently achieves higher retrieval quality than competitors as the amount of missing data increases. The results indicate that exploiting correlations among latent distance spaces is an effective strategy for similarity retrieval over incomplete data.

10:20
Applying Machine Learning to Predicting Malaria Prevalence: Spatial Analysis Results and Significance

ABSTRACT. Malaria remains a significant public health challenge in Nigeria, where climatic conditions favor transmission. Despite national declines, regional disparities persist, reflecting spatial and environmental heterogeneity. This study leverages a probabilistic learning framework, hierarchical Bayesian spatial modelling, to predict malaria prevalence among children aged 2–10 years and generate climate-sensitive risk maps for six southwestern states. Using malaria prevalence survey data from the Nigeria Malaria Indicator Surveys (NMIS) and climatic covariates from the Demographic Health Survey (DHS) spatial repository, we implemented the model within the Integrated Nested Laplace Approximation (INLA) framework, incorporating structured spatial effects via an intrinsic conditional autoregressive prior (ICAR) and unstructured random effects to capture non-spatial variability. Results reveal significant spatial heterogeneity, with Osun recording the highest prevalence (47%), followed by Oyo (45%), Ekiti (44%), Ondo (38%), Ogun (31%), and Lagos (12%). Climatic factors had a marginal influence, with aridity inversely related to prevalence, temperature positively associated, and rainfall exhibiting a non-linear effect. The results indicate that while climate plays a role, local environmental and socioeconomic determinants may also influence malaria prevalence. By integrating spatial dependencies and uncertainty quantification, this approach demonstrates how Bayesian learning can support predictive analytics and data-driven malaria intervention strategies, bridging statistical modelling and machine learning for public health policy.

09:00-10:30 Session 13B: Computational Pathology
Location: Atrium A
09:00
Deep learning classifies Helicobacter pylori infection states from multi-channel immunofluorescence imagery

ABSTRACT. Chronic infections with Helicobacter pylori (Hp) are the main risk factor for stomach cancer, which is one of the most prevalent causes of cancer-related deaths worldwide. Dependent on bacterial virulence factors, Hp profoundly remodels the actin cytoskeleton of host epithelial cells, yet translating these pheno types into scalable, quantitative readouts remains challenging. In this study, we employed deep learning to classify infection directly from immunofluorescence images and thereby provide a label-efficient proxy for virulence phenotyping. Epithelial monolayers were stained with DAPI (nuclei), phalloidin (F-actin), and a cellular protein marker and imaged under standardized conditions for three classes: non-infected, wild-type–infected, and mutant-infected cells. Single-channel and multi-channel models evaluated confirm that classifiers reliably distinguished all three states from images. These findings establish an image-based, AI enabled readout of virulence phenotypes suitable for scalable screening of pathogen-host interactions.

09:15
Towards Computational Sub-Diffraction Single Vesicle Detection in Fluorescence Microscopy

ABSTRACT. Accurate detection of single extracellular vesicles (EVs) in fluorescence microscopy is hindered by resolution and blurring artifacts caused by overlapping point spread functions. This study aims to push the boundaries set by physics by employing synthetic data to train a segmentation model as a proof-of-concept study. To achieve this, we first assess uncertainty in manual EV labeling by comparing annotations from four human annotators across two annotation sessions. Driven by a high variability in annotations, a physics inspired strategy to generate artificial data is developed. To demonstrate how synthetic data can be used to recover single EV information from distorted real images, we train a simple segmentation network to map simulated images back to their ground-truth. The trained network is then confronted with real data. Qualitative assessment suggest reasonable results even for crowded scenes. Outputs are further validated using a plausibility check as a surrogate for the lack of real ground truth.

09:30
From crown delineation to CEJ–alveolar ridge measurement: a framework for Multi-Task panoramic radiographic segmentation

ABSTRACT. The creation of large-scale datasets curated by domain experts is essential for advancing scientific applications, ensuring data quality, consistency, and reproducibility, particularly in domains requiring nuanced human interpretation. However, despite their clear importance, the manual annotation of these datasets by domain experts is a time-consuming and costly process, which presents a significant barrier to creating large-scale repositories. This significant investment in financial resources and specialized effort necessitates semi-automatic methodologies to optimize the annotation workflow and enable scalable dataset creation. In dentistry, for example, annotated datasets of dental crown contours are fundamental for developing and evaluating computer-aided diagnostic tools, particularly for algorithms that must distinguish natural dental crowns, restored dental crowns, and prosthetic dental crowns. Such datasets support critical clinical applications, including restoration quality assessment and prosthesis fit evaluation. In this work, we present the development of an expert-annotated panoramic radiograph dataset of dental crowns using Artificial Intelligence. A team of two radiologists labeled a set of 769 panoramic radiographs using a customized Picture Archive and Communication System (PACS) interface. Each annotation, stored as a JSON file, delimited crown regions of interest. We demonstrate the dataset's utility by training a YOLOv11x-Seg instance-segmentation model, chosen for its practical accuracy-speed trade-off for systematic use and for producing object-level masks that facilitate downstream post-processing and geometric measurements. The model achieved a Mean Average Precision (mAP@.5:.95) of 70.5% for crown masks and 81.87% for the alveolar ridge. Crucially, the dataset includes both alveolar ridge and CEJ annotations, enabling dental bone loss estimation from the distance between these two lines, which is the core diagnostic signal. Furthermore, we developed an end-to-end application for automated dental bone loss quantification. In a perceptual study with three board-certified radiologists, our system's outputs received a favorable accuracy rating (perceived accuracy >70%) in 95.1% of the evaluated cases.

09:45
Agreement Aware Hybrid YOLOv8-Keypoint R-CNN Framework for Automated Tooth Length Estimation from Dental X-Ray Images

ABSTRACT. Accurate tooth length measurement from dental radiographs is essential for diagnosis, treatment planning, and longitudinal monitoring. Although Deep Learning methods has demonstrated strong potential in medical image analysis, its application for clinically reliable morphometric estimation in routine dental radiography remains insufficiently explored, particularly with respect to measurement agreement. This study addresses this gap by developing a hybrid deep learning framework in which two architectures ar employed: YOLOv8 for tooth detection and Keypoint R-CNN for anatomical landmark localization. The proposed pipeline enables automated Tooth length estimation through sequential object detection and keypoint regression.To the best of our knowledge, this study is among the first to enable automated tooth length estimation for all teeth in a panoramic radiograph using a unified detection–localization framework, unlike previous studies that focus on individual teeth or limited regions. The YOLOv8 model achieved a detection accuracy of 91.26%, while the Keypoint R-CNN attained a localization precision of 94%. Bland–Altman analysis revealed mean bias ranging from -39.82 to +27.65 pixels with narrow limits of agreement and no evidence of proportional bias, indicating stable geometric consistency across varying root lengths. These findings support the feasibility of integrating hybrid detection– localization approaches into routine dental imaging workflows for clinically consistent automated tooth morphometry

10:00
In-silico Validation of VesselVision: A Hybrid DL-based Arterial Wall-Recognition from A-mode Ultrasound Scans

ABSTRACT. Early Vascular Aging (EVA) is gaining prominence in preventive cardiovascular care; however, its large-scale adoption remains constrained by the lack of affordable and fully automated vascular assessment tools. While ultrasound provides a practical window into vascular health, most automated vessel analysis methods are designed for B-mode imaging and rely on specialized hardware and expert operation, limiting their deployment in community and resource-constrained settings. A-mode ultrasound offers a favorable frugal and edge-deployable alternative, but its single scan-line nature and reduced spatial redundancy pose significant challenges for robust artery detection. We propose a hybrid deep learning and signal processing framework for automated arterial wall detection from continuous A-mode acquisitions. The method stacks finite RF frames to generate motion-mode representations, recasting the problem as a pattern-recognition task. A lightweight YOLO-11n model was used to demonstrate feasibility. Trained on a human dataset of 82,585 images (57% positive), the model achieved 83.8% precision, 93% recall, and 92.9% mAP@0.5. Extensive in-silico validation using a purpose-built simulation testbed—spanning diverse noise and motion conditions. The method showed accuracy exceeding 97% for SNR > 10 dB. Accuracy remained above 90% across heart rates of 30–120 bpm, with the lower values observed at the lower heart rates. Performance was similarly consistent and high across all waveform types. Precision, recall and accuracy values were in similar ranges, indicating the methods ability to discern both artery’s presence or absence. These results establish the practical viability of the proposed hybrid A-mode pipeline for robust artery detection.

10:15
Continuous Coordinate-Space Diffusion for Probabilistic Cephalometric Landmark Localization

ABSTRACT. The detection of cephalometric landmarks is pivotal to craniofacial analysis, requiring models that capture struc tured spatial dependencies arising from radiographic variabil ity. In this paper, we propose a diffusion-based framework that formulates landmark prediction as probabilistic inference in continuous anatomical coordinate space, enabling iterative estimation instead of one-shot predictions. Reverse diffusion is performed directly on landmark coordinates conditioned on image features, avoiding discretized heatmap representations while remaining computationally efficient. Modeling coordinates as a distribution rather than point estimates promotes stable performance under dataset variability. Experiments on the ISBI 2015 and ISBI 2023 benchmarks demonstrate state-of-the-art performance, achieving 0.82 mm MRE on ISBI 2015 and 0.66 mm MRE on ISBI 2023 respectively

09:00-10:30 Session 13C: EHDS, Federated Analytics and Synthetic Health Data
Location: Atrium B
09:00
A Secure Processing Platform supporting the European Health Data Space infrastructure

ABSTRACT. The European Health Data Space (EHDS) Regulation (EU) 2025/327 requires that secondary use of electronic health data take place under formal data permits, in Secure Processing Environments (SPEs), and with appropriate pseudonymization safeguards. However, no open, permit-driven architecture currently converts governance artifacts into machine-enforceable processing controls. We present a 12-module governance-to-execution pipeline that takes a YAML-encoded data permit and a structured variable inventory and deterministically compiles them into runtime controls that govern every transformation performed on the data. The system generates pseudonymized (HMAC-SHA256), minimized datasets that are ready for researcher analysis, as well as a sealed post-analysis, audit-ready evidence pack. Evaluating 425,087 MIMIC-IV-ED emergency-department records, the pipeline completes the full 12-module execution in about two minutes, generates over 40 audit-able artifacts per run, while providing comprehensive coverage against a TEHDAS2-aligned governance framework. We also identified five areas for future extension: re-pseudonymization, cross-project linkage prevention, key lifecycle management, SPE-native access enforcement, and re-identification governance. The pipeline's novelty stems from its architectural design: a declarative, permit-first system in which every processing step can be traced back to a governance decision and verified using cryptographically sealed evidence outputs.

09:15
Health Data Space Nodes for Privacy-Preserving Federated Learning and Analysis

ABSTRACT. With an ever-increasing volume of health data produced, large-scale medical studies frequently rely on machine learning and advanced analytics. Yet, in many healthcare systems, clinical data is typically distributed across institutions and constrained by privacy and governance requirements. We present a deployable prototype that supports both federated learning and federated analysis without transferring patient-level records. The system connects five previously developed Health Data Space nodes that act as private, harmonised data providers, communicating analysis data exclusively through authenticated REST endpoints. Using open-source data, we trained a logistic regression model as a proof-of-concept containerised architecture and implemented a monitoring tool based on Prometheus and Grafana for traceability. The resulting federated model reached 0.727 accuracy and 0.456 Matthews Correlation Coefficient, compared with a centralised baseline of 0.711 accuracy and 0.422 MCC. Federated training yielded higher specificity (0.788 vs. 0.745) while slightly reducing sensitivity, illustrating a clinically relevant trade-off. In addition, federated analysis was used to compute demographic indicators (mean age and gender ratio) across nodes by aggregating local summaries rather than exposing individual data. Overall, the results indicate that the proposed node solution provides a practical pathway towards privacy-preserving secondary use of health data.

09:30
From Imbalanced Cohorts to Virtual Populations: Leakage-Aware Synthetic Data Augmentation For Heart Failure Diagnosis

ABSTRACT. Early heart failure (HF) diagnosis using real-world clinical data is hindered by class imbalance, heterogeneous documentation, and differences between primary and secondary care pathways. This study presents a leakage-aware, AI-enabled workflow for synthetic data augmentation to support HF classification across care settings. The framework integrates an IEEE 2801-2022–aligned data quality assessment, synthetic minority-class balancing, virtual population generation, and downstream utility evaluation under cross-validation with strictly real-only validation. Multiple synthetic generation approaches, including oversampling, probabilistic mixture models, and deep generative methods, are systematically compared using distributional fidelity metrics and offline augmentation with XGBoost as a strong tabular baseline. Results demonstrate that moderate synthetic augmentation (40–60\%) improves balanced accuracy and sensitivity in both care settings, whereas excessive augmentation degrades generalization due to synthetic-to-real mismatch. By explicitly separating care pathways and enforcing leakage-free evaluation, this work provides practical guidance for the responsible use of synthetic data in real-world HF decision support.

09:45
Synthetic Data Generation and Multi-Dimensional Evaluation in Fondazione Italiana Linfomi (FIL) Diffuse Large B-Cell Lymphoma Clinical Cohort

ABSTRACT. Synthetic patient data offer a promising solution to the privacy and accessibility constraints that hinder data-driven innovation in healthcare, particularly for clinical trial research. This paper presents a systematic study of synthetic data gener- ation and evaluation using a real-world clinical cohort from the Fondazione Italiana Linfomi. We define a standardized, model- agnostic workflow to benchmark diverse generative paradigms across the critical dimensions of fidelity, privacy and utility. Our findings reveal that no single model excels across all metrics, emphasizing the need to evaluate models on a case-by-case basis. By proposing a structured, literature-informed evaluation suite, this work facilitates context-aware model selection to support reproducible clinical research in the context of clinical trials.

10:00
Mapping Actions and Resources for Innovation in Rare Diseases (MARI-RD): Tracking Collaboration Networks in Portuguese-Speaking Countries

ABSTRACT. Rare disease ecosystems are fragmented and under mapped across Portuguese speaking countries. We present a collaboration network tracking framework that integrates REDCap survey data, deduplication, standardized categories, and temporal network metrics to monitor cross border partnerships and identify hub institutions. This study is part of MARIRD (Mapping Actions and Resources for Innovation in Rare Diseases), a WP1 activity of the CPLP project focused on mapping actions and resources for innovation. The approach supports longitudinal tracking of referrals, institutional ties, and service coverage, producing reproducible indicators for policy and capacity building

10:15
VIEWER: A population health management platform for supporting electronic audit and feedback in mental health care

ABSTRACT. Due to the burden and negative impacts of psychotic disorders, treatment alone is insufficient to reduce the gap in psychosis care thereby highlighting the need for greater investment in prevention strategies which are targeted towards an entire population to detect risk factors early and prevent adverse health outcomes at scale. This work evaluates the technical feasibility of implementing electronic audit and feedback (eA&F) to support population health management in mental health care by facilitating the visualisation of clinical health records to provide relevant summaries for eA&F.

A Participatory Design approach was adopted as an integral framework throughout the design and development of an eA&F platform. The eA&F platform, VIEWER, was evaluated for its technical feasibility. The feasibility testing of the platform was carried out with clinical teams in the context of three different clinical use cases to support eA&F for managing the psychosis patient population. An eA&F platform for supporting population health management was developed. The platform was used successfully for eA&F to support the identification of unmet needs, variations in care processes, and inequalities between groups of patients within the psychosis patient population.

It is technically feasible to develop and deploy an eA&F platform that supports population health management in secondary mental healthcare. Using advanced clinical analytics to integrate population-level data with routine clinical practice raises exciting possibilities of building more intelligent services that are adaptive to the needs of the population, and adopt a more proactive, preventative approach. The innovative eA&F platform gives clinicians the flexibility to look at clinical data that is relevant to them at the level of abstraction that is appropriate for the decision being made. By sharing the methods and steps taken in this work, useful insights can be provided to other researchers and practitioners that are developing similar digital health interventions.

09:00-10:30 Session 13D: Trustworthy AI, Privacy and Human-Centred Health Analytics
Location: Atrium C
09:00
ConfiMed: A Parallel English–Arabic Benchmark for Fine-Grained Confidentiality Classification in Healthcare Correspondence

ABSTRACT. Confidentiality assessment in healthcare correspondence is critical yet underexplored, largely due to the scarcity of publicly available datasets constrained by privacy regulations (e.g., HIPAA, GDPR). Existing benchmarks primarily focus on binary PHI detection (i.e., presence versus absence) in clinical notes and fail to capture the graded sensitivity of administrative communication. This limitation is further compounded in multilingual settings, where the lack of parallel data restricts systematic evaluation of cross-lingual confidentiality modeling.

In this paper, we introduce ConfiMed, a multilingual benchmark for fine-grained confidentiality classification. ConfiMed comprises 2,000 parallel English–Arabic email pairs synthetically generated to reflect the linguistic and contextual complexity of organizational healthcare communication. Each instance is annotated using a five-level ordinal confidentiality schema and validated through a multi-stage human annotation process, achieving strong inter-annotator agreement (Krippendorff’s α = 0.899; QWK = 0.798).

We benchmark lexical (TF–IDF) and multilingual transformer-based (XLM-R) models across monolingual and cross-lingual settings. Monolingual fine-tuning achieves 60.2% accuracy (QWK = 0.748) in English and 56.7% accuracy (QWK = 0.736) in Arabic. Zero-shot transfer remains competitive yet challenging, reaching 54.0% accuracy (QWK = 0.666) for EN→AR transfer. These findings suggest that while core confidentiality patterns are shared across languages, precise sensitivity assessment requires modeling language-specific nuances. ConfiMed therefore provides a challenging and realistic benchmark for multilingual confidentiality modeling in healthcare administration.

09:15
DataPrivScore: A Framework for Privacy Assessment in Healthcare Datasets

ABSTRACT. The widespread adoption of Electronic Health Records (EHRs) and large-scale clinical databases has significantly advanced medical research, yet it introduces substantial privacy risks when sharing sensitive patient information. While various privacy-preserving techniques and models exist, evaluating the actual level of protection achieved in complex healthcare datasets, such as those following the OMOP CDM, remains a significant challenge. This paper proposes DataPrivScore, a serverless web application designed to quantify privacy risks through a comprehensive metric called the Privacy Index. The framework locally processes datasets within the user's browser to ensure data residency and security. It utilizes a tiered, automated attribute classification system, supported by a manual override to classify attributes. Experimental results using synthetic datasets in standard formats demonstrate the tool's effectiveness in distinguishing between varying degrees of data protection and identifying critical vulnerabilities in healthcare data. The tool is open-source, and the code is publicly available at https://github.com/ieeta-mith/DataPrivScore.

09:30
Between Aspiration and Reality – Datasets for Network Security in Healthcare
PRESENTER: Jordi Doménech

ABSTRACT. The reliability of network Intrusion Detection Systems (IDS) research in healthcare mainly depends on the availability and realism of network security datasets. However, due to strict privacy regulations and the sensitivity of patient data, real hospital traffic is rarely accessible. For that reason, current researchers rely heavily on publicly available datasets, which are typically synthetically generated or based on simulated devices in laboratory environments. This study presents an analysis of real network traffic captured from a hospital environment and a network feature-based comparison between the real environment and two publicly available datasets. The results indicate that several window-based features exhibit major shifts between real and synthetic datasets. In contrast, flow-based features present greater stability across publicly available datasets and the real hospital network. These findings reveal that while public datasets capture some general properties of hospital traffic, they fail to reproduce its full heterogeneity and temporal dynamics. The study highlights the need for more realistic data generation and validation methods to improve the reliability and transferability of IDS solutions in healthcare environments.

09:45
Validation of a tool to help clinicians decide upon the trustworthiness of Patient Generated Health Data

ABSTRACT. The increasing adoption of Patient-Generated Health Data (PGHD) from wearables, apps, and patient-reported outcomes presents both opportunities and challenges for clinical decision-making. While PGHD enhances personalized care, clinicians struggle to systematically assess its trustworthiness. This study validates the PGHD trust canvas, a structured tool designed to help clinicians evaluate PGHD through seven key domains: Purpose, Clinical Quality, Risk, Governance, Information Origins, Necessity, and Prior Usage.

Using a qualitative approach, we conducted semi-structured interviews with 11 clinicians across diverse specialties to examine the tool's perceived usefulness and usability. Results demonstrate that the canvas effectively supports PGHD evaluation, with clinicians praising its ability to structure complex trustworthiness assessments (8/11 participants), document decision rationale (7/11), and serve educational purposes (7/11). Prompt questions and the Information Origins section were particularly valued for guiding clinical reasoning. Validation confirmed that the canvas' primary utility resides in non-acute, deliberative contexts—such as management planning and guiding principles provision—rather than in time-sensitive acute care.

The PGHD trust canvas addresses a critical gap in digital health integration by providing a standardized yet flexible framework for trust assessment. Its dual utility for carefully considered decision support and documentation meets growing needs in quality improvement and interdisciplinary care coordination. While particularly suited to chronic disease management and training environments, the tool's time burden may limit use in emergency contexts. These findings position the canvas as a practical solution for safer PGHD integration while highlighting pathways for future refinement and implementation research.

10:00
Operationalizing Entrustable Professional Activities: A Data-Centric Framework for Longitudinal Competency Analytics in Medical Education

ABSTRACT. Entrustable Professional Activities (EPAs) are increasingly adopted in competency-based medical education to structure clinical training around observable, assessable tasks. However, in many institutions, EPA evaluation remains operationally fragmented, frequently relying on paper-based workflows or loosely integrated digital tools that fail to support longitudinal competency analytics. This work presents a data-centric digital infrastructure designed to operationalize EPA-based assessment at institutional scale. We formalize the EPA model into a structured computational framework that integrates Micro-Entrustable Competencies (MECs), autonomy levels (N1–N3), tutor feedback loops, and cross-curricular progression into a unified persistence model. The system supports multi-year academic continuity, structured reevaluation mechanisms, and role-based evaluation workflows while enabling longitudinal tracking of competency acquisition. A unified data model links evaluations to competency dimensions, enabling fine-grained analytics such as autonomy evolution curves, feedback density metrics, and cross-cohort performance comparison. The platform was deployed within a newly established medical program, replacing a paper-based evaluation notebook. Usability validation and load testing demonstrate feasibility, scalability, and high user acceptance in real academic conditions. Beyond system implementation, this work contributes a formalized digital representation of entrustment logic that enables measurable, data-driven competency monitoring. Our results suggest that structured EPA digitalization can transform competency-based medical education from episodic assessment toward continuous, analytics-enabled professional development. The proposed framework provides a scalable foundation for institutional and cross-institutional EPA data integration.

10:15
Identifying Performance and Engagement Profiles in Game-Based Handwriting Training in Primary Schools: Toward Early and Personalized Interventions

ABSTRACT. This study proposes an unsupervised framework to identify performance and engagement profiles during game-based handwriting training in primary school children. Sixty-six first-grade students completed a three-month intervention using three serious games targeting handwriting prerequisites. Engagement features were aggregated across the training period, while performance features were analyzed longitudinally to detect time-evolving clusters. Two games yielded early and stable performance profiles after approximately 30 sessions (around three days of training), meaningfully differentiating between Fast, Accurate, and Struggling children. Simplifying game-specific clusters into macro-profiles improved interpretability while preserving clinical relevance. Profiles showed significant associations with clinical risk: the Struggling profile concentrated 58.8% of at-risk children (lift: 2.43), while no at-risk children were found in the Accurate profile. Engagement profiles were also related to risk, with Low Engagers showing a clinical risk prevalence of 52.4% (lift: 2.10), compared to only 7.1% among High Achievers. Results suggest that meaningful student archetypes can be identified after a limited observation period, supporting early and data-driven personalization of handwriting interventions in educational settings.

10:30-11:00Coffee Break
11:00-12:30 Session 15A: Deep Learning for Clinical Decision Support
Location: Panorama
11:00
Transfer Learning for ECG Classification: Effects of Label Overlap, Granularity, and Fine-Tuning in Small Clinical Datasets

ABSTRACT. Accurate electrocardiogram (ECG) classification using deep learning requires large annotated datasets. Typically, hospitals lack sufficient amounts of labeled data for model training. Transfer learning, which adapts models pretrained on large, open datasets (source domain) to local settings (target domain), offers a promising solution for addressing this issue. However, its effectiveness can greatly vary depending on how well the data distributions and the scope and specificity of diagnostic labels match between the source and target domains. In this study, we systematically evaluate transfer learning strategies for ECG classification in settings with limited annotated target data (2000 records or fewer), focusing on how the amount of shared diagnostic labels and the level of detail in the label categories differ between source and target data. Using a ResNet architecture, we conducted three experiments: (1) assessing target model performance in terms of overlap between source and target labels, (2) analyzing the effect of source label specificity (coarser vs. fine-grained categories) on the model's performance in the target domain, and (3) evaluating the effect of fine-tuning depth. The results showed transfer learning to be ineffective without shared diagnostic labels, and even a small degree of overlap led to improved performance, with benefits increasing as more labels were shared. Pretraining with medium-level label granularity delivered the best results when label overlap was low and data was scarce, while fine-grained pretraining excelled with greater overlap. Deeper fine-tuning improved model performance and enabled the use of coarser source labels. Altogether, these findings provide guidance on when transfer learning is likely to offer better performance than models trained solely on local hospital data.

11:15
Integrating the Clarke Error Grid into Deep Learning to Enhance Clinical Performance in Glucose

ABSTRACT. Accurate blood glucose level (BGL) prediction based on Continuous Glucose Monitoring (CGM) data is a cornerstone in modern diabetes mellitus management. BGL prediction models based on Artificial Neural Networks or deep learning are typically optimized using global error metrics, such as Mean Squared Error (MSE). However, these metrics often fail to account for the clinical significance of prediction errors, which varies drastically across different BGL ranges.This work introduces a glucose-range-specific cost function designed to prioritize clinical safety by emphasizing performance in critical regions, specifically hypoglycemia and hyperglycemia. The proposed cost function is derived from the Clarke Error Grid (CEG), a clinical standard performance metric used in BGL prediction assessment that weights errors based on their potential risk to the patient.This novel cost function is evaluated using the T1DiabetesGranada dataset and a Long Short-Term Memory (LSTM) architecture to predict BGL values at a 60-minute prediction horizon. Experimental results demonstrate that the glucose-range-specific cost function significantly enhances performance in critical BGL ranges. Notably, the model achieved up to a 75% improvement in prediction performance within the hypoglycemic range compared to standard MSE-based BGL prediction models, without compromising global prediction performance. These findings suggest that integrating clinical risk into the learning process produces models that are both robust and more safely aligned with the requirements of real-world T1D care.

11:30
Assessing Concept and Virtual Drift in a Deep Learning Model for Antibiotic Resistance Prediction

ABSTRACT. Antimicrobial resistance represents a critical global health threat, exacerbated by inappropriate empirical antibiotic prescribing. To address this, we developed a Deep Learning-based Clinical Decision Support System trained on over 200,000 historical antibiogram records from three French hospitals to predict antimicrobial resistance and assist clinicians in therapy selection at the bedside, while clinical specimen culture and antimicrobial susceptibility testing are processed at the laboratory. Our model achieves high performance (AUROC of 0.92) and reliable uncertainty calibration (AUSE of 0.03) through Bayesian inference. But clinical deployment faces the persistent challenge of concept drift, as resistance patterns evolve over time. To address this, our present contribution applies the ADWIN algorithm to monitor both error rates and uncertainty signals, enabling detection of concept and virtual drift. This dual-level approach allows proactive identification of emerging resistant strains and ensures the long-term safety and reliability of the system in dynamic clinical settings.

11:45
A Hierarchical Deep Learning Framework for Rapid Eye Movement Behavior Disorder Detection

ABSTRACT. Rapid Eye Movement (REM) Behavior Disorder (RBD) is a parasomnia strongly associated with future neurodegenerative diseases, yet its diagnosis depends on labor-intensive polysomnography (PSG) analysis and analytical indices such as the REM Atonia Index (RAI). Automated detection is challenging due to the temporal sparsity and variability of RBD-related events.

This work presents SOMNUS-RBD, a hierarchical deep learning framework for patient-level RBD prediction from REM sleep PSG. REM segments are encoded into embeddings, then processed with channel-level attention pooling and temporal LogSumExp aggregation to capture sparse pathological activations. Five data configurations were evaluated on a 49-patient cohort using 10-repeated 5-fold cross-validation: (1) chin electromyography (EMG) embeddings alone, (2) multichannel EMG with independent per-channel embeddings, (3) multichannel EMG with joint multi-channel embeddings, (4) multimodal EMG, electroencephalography (EEG) and electrooculography (EOG) with independent per-channel embeddings, and (5) multimodal EMG, EEG, and EOG with unified per-modality embeddings.

Using only chin EMG, the proposed framework showed improvements compared to the analytical baseline (RAI), as accuracy increased from 0.735 to 0.786 and recall from 0.577 to 0.781, while maintaining competitive precision and AUC. Independent multichannel EMG achieved the best performance with an AUC of 0.832 and a recall of 0.827. In contrast, grouping channels prior to embedding extraction reduced discriminative performance. Attention weights highlighted EMG as the dominant modality compared to EEG and EOG, and temporal importance scores aligned with physiologically meaningful segments of loss of atonia. These findings suggest that SOMNUS-RBD surpasses traditional analytical indices while providing intrinsic channel- and timestep-relevant measures for RBD detection.

12:00
Group therapy for elderly depression: Deep Learning Based on Large Models of Music Affective Computing

ABSTRACT. Music therapy offers a non-invasive alternative; however, its effectiveness depends heavily on individualized emotional matching. To address this limitation, this study proposes an Intelligent Music Therapy System that integrates Music Affective Computing Models with Internet of Things–based wearable sensing for personalized emotional intervention. The system incorporates a deep learning–based Music Affective Computing Model to recommend music and an IoT framework to continuously acquire physiological signals, including heart rate variability, from elderly users. Six representative Music Affective Computing Model architectures were systematically evaluated, among which a hybrid Fractal Convolution Neural Network–Long Short-Term Memory–Transformer model demonstrated the highest classification accuracy and generalization stability in Chinese classical and ethnic music emotion recognition. To validate clinical applicability, an intelligent music therapy system was deployed in a randomized controlled group music therapy trial for elderly individuals. Experimental results indicated significant improvements in heart rate variability indices and depressive mood scores compared with the control condition; therefore, the proposed system can effectively support emotion-aware music intervention in nursing home environments.

12:15
From Concepts to Evidence: Literature-Grounded Skin Lesion Diagnosis with Vision–Language and Retrieval-Augmented Models

ABSTRACT. Artificial intelligence has shown strong potential for dermatological diagnosis, but clinical adoption requires not only accurate predictions but also transparent and verifiable reasoning. Recent concept-based approaches improve interpretability by exposing clinically meaningful dermoscopic features predicted from images; however, the diagnostic explanations produced by large language models (LLMs) remain internally generated and are not explicitly grounded in external biomedical evidence. We propose a literature-grounded diagnostic framework that integrates dermoscopic concept extraction with retrieval-augmented generation (RAG) over a dermatology-focused PubMed corpus. The framework combines vision–language models for lesion concept prediction, dense biomedical literature retrieval, multi-stage reranking strategies, and LLM-based reasoning to generate diagnoses accompanied by explanations supported by retrieved scientific evidence. Beyond introducing this architecture, we conduct a systematic analysis of retrieval design choices—including diversity-based retrieval, cross-encoder reranking, and natural language inference filtering—and examine how these strategies influence both diagnostic performance and explanation grounding. Experiments on Derm7pt, HAM10000, and PH² under the same binary setting used in prior concept-based work (nevus vs. melanoma) achieve balanced accuracy up to 0.792, 0.735, and 0.831, respectively. Our results reveal a trade-off between retrieval diversity and semantic precision: diversity-oriented strategies improve diagnostic performance, while precision-oriented reranking yields explanations more strongly supported by biomedical evidence. By explicitly linking predictions to retrievable scientific literature and enabling claim-level grounding analysis, the proposed framework supports auditable and evidence-grounded dermatological AI systems.

11:00-12:30 Session 15B: Medical Imaging, Segmentation, and Computational Pathology
Location: Atrium A
11:00
Bridging 3D Deep Learning and Curation for Analysis and High-Quality Segmentation in Practice

ABSTRACT. Accurate 3D microscopy image segmentation is essential for quantitative bioimage analysis, yet state-of-the-art foundation models frequently produce error-prone results that necessitate manual proofreading. Manual curation remains the bottleneck for generating high-quality training data and ensuring biological downstream accuracy. We present VessQC, an open-source tool for uncertainty-guided curation of volumetric segmentations. VessQC integrates uncertainty maps to prioritize user attention on regions with high error probability, optimizing the human-in-the-loop workflow. In a study of 3D light-sheet microscopy volumes of murine brain vasculature, uncertainty-guided correction improved error detection recall significantly compared to conventional curation, without increasing total processing time. VessQC thus enables efficient, human-in-the-loop refinement of volumetric segmentations and bridges a key gap in real-world applications between uncertainty estimation and practical human-computer interaction. The software is freely available at github.com/MMV-Lab/VessQC.

11:10
Enhanced Medical Image Segmentation using Hybrid Diffusion Model with Mamba

ABSTRACT. Computed tomography (CT) and magnetic resonance imaging (MRI) are essential modalities for clinical diagnosis of neurological and musculoskeletal diseases. However, medical images are often degraded by noise, motion artifacts, and low contrast, which substantially impair segmentation accuracy and downstream analysis. To address these challenges, we propose an accurate and robust MRI segmentation framework that tightly integrates a diffusion-based image enhancement module with a Mamba-based segmentation network. The enhancement module restores contrast and structural fidelity in severely degraded scans, producing high-quality representations that facilitate precise segmentation. The segmentation backbone adopts a UNet-shaped Mamba architecture, which leverages selective state space modeling to effectively capture long-range dependencies and multi-scale contextual features. Furthermore, we introduce an efficient knowledge distillation strategy that removes nearly half of the VSS blocks from the teacher network to derive a compact student model, achieving a 25% inference speedup and reduced model complexity while maintaining or exceeding teacher-level segmentation performance. Extensive experiments demonstrate that the proposed framework consistently outperforms conventional enhancement–segmentation pipelines, GAN-based enhancement methods, and existing knowledge distillation approaches, highlighting its effectiveness for accurate, reliable, and efficient medical image analysis.

11:20
BEA-Net: Boundary-Aware 3D Attention Network for MRI Knee Cartilage Segmentation

ABSTRACT. Segmenting knee cartilage from magnetic resonance images is vital to understanding the pathogenesis and progression of knee osteoarthritis. However, it presents several challenges due to the complex morphology and thin structure of knee cartilage. Dedicated boundary learning has been shown to improve segmentation predictions from deep learning models when segmenting tissues with complex or unclear boundaries, but the application of dedicated boundary learning for 3D segmentation of knee cartilage from MRI has been limited. In this work, we introduce BEA-Net, a multi-task boundary-aware attention network for 3D segmentation of knee cartilage from MRI. BEA-Net uses a dual-branch architecture with an auxiliary decoder dedicated to learning boundary information. A novel boundary-enhancement attention module is used to amplify and focus on salient boundary features to refine boundary predictions. Learnt boundary features are fused with primary-decoder features to enhance segmentation predictions, and the network is optimised using a combination loss that encourages inter-decoder consistency. BEA-Net was evaluated on an MRI dataset from the Osteoarthritis Initiative, outperforming other state-of-the-art models when segmenting four types of knee cartilage and achieving Dice scores of 89.52%, 88.46%, 85.87%, 87.11% when segmenting the femoral, tibial, and patellar cartilage, and the meniscus respectively.

11:30
Cross-Dataset Generalization in Breast MRI Tumor Classification via Class-Wise Dataset Mixing

ABSTRACT. Breast MRI is highly sensitive for detecting breast tumors, but exam volumes are large and time-consuming to interpret. Although deep learning has shown strong performance on internal test splits, models often fail to generalize across institutions because of domain shift and dataset-origin bias. We study this failure mode and propose a simple training strategy to improve cross-dataset generalization for breast tumor classification.

We train EfficientNet-B3 and WaveViT-Small using Duke Breast Cancer MRI and fastMRI, and we evaluate exclusively on the independent multi-center MAMA-MIA cohort. A controlled confounded setup, where label is perfectly correlated with dataset origin, leads to near-chance external accuracy (0.5048--0.5265) despite high internal performance, indicating shortcut learning. We then break this correlation by mixing both datasets within each class, while using patient-level splitting, augmentation, and strict leakage controls. On MAMA-MIA, dataset mixing improves accuracy/F1 to 0.8463/0.8625 (WaveViT-Small) and 0.8884/0.8994 (EfficientNet-B3), with EfficientNet-B3 reaching 0.9973 recall. These results show that controlling dataset-origin bias and enforcing strict external validation are key to developing more reliable breast MRI classifiers.

11:40
Multimodal AI for clinically significant prostate cancer detection: A comparative study of structured clinical-based, imaging and fusion models

ABSTRACT. Accurate detection of clinically significant prostate cancer (csPCa) remains a major clinical challenge due to the limited cancer specificity of prostate-specific antigen screening, the invasiveness of biopsy procedures, and the inter-reader variability in magnetic resonance imaging interpretation. In this setting, artificial intelligence provides promising data-driven strategies to improve diagnostic precision and support clinical decision-making. This study presents a comparative analysis of modeling strategies for csPCa detection, examining approaches based on structured clinical variables, magnetic resonance imaging data alone, and their multimodal integration. The apparent diffusion coefficient, derived from diffusion-weighted imaging, is included to quantify its incremental predictive value. Results show that clinically based models achieve strong and stable performance, whereas image-only approaches are mainly limited by data scarcity. Multimodal integration provides competitive results and improves sensitivity in clinically relevant operating regions, although its gains remain influenced by the representational capacity of the imaging branch. Overall, the findings provide practical guidance on selecting appropriate modeling strategies for csPCa prediction under different data availability constraints, clarifying the trade-offs between clinical, imaging-based, and multimodal approaches.

11:50
A Semi-Supervised Multiclass Pixel-Domain Classification Approach for Breast Cancer Microscopy Images Based on Nonlinear Metrics

ABSTRACT. Breast cancer is one of the most prevalent and deadly diseases worldwide, representing a major public health challenge that affects millions of individuals each year. Histopathological analysis, more specifically the immunohistochemistry (IHC) technique, plays a fundamental role in diagnosis by enabling the identification and quantification of positive and negative cellular markers. However, the analysis of microscopy images is time-consuming, subjective, and prone to human error and may presents variability. With the recent computational advancements, many automated methods for image analysis and decision support have been developed for breast cancer diagnosis based on IHC microscopy images. However, the objective metric used to guide similarity or dissimilarity of biomarker colors when based on linear metrics is not expressive to characterize the complexity of multi-biomarker scenarios inherent to tissue characterization by IHC. This limitation represents a significant drawback in properly modeling nonlinear patterns and morphological variations across different cellular classes. To address these challenges, this paper proposes a multiclass classification approach based on nonlinear metrics, specifically designed for breast cancer microscopy images in IHC. The proposed method extends beyond linear classifiers by employing a nonlinear model based on polynomial feature expansion of the Mahalanobis distance, enabling the capture of complex, nonlinear relationships among cellular patterns in IHC images. Experimental results obtained on breast cancer microscopy datasets demonstrate that the proposed approach achieves promising performances, with a micro-averaged F1-score of 0.76, overall precision of 0.77, and specificity of 96.3% for positive nuclei detection, indicating robustness to false-positive classifications.

12:00
A Detection-Driven Pipeline for Nuclei Classification in Nasal Cytology

ABSTRACT. Nasal cytology is a minimally invasive diagnostic technique used to assess inflammatory and allergic conditions of the upper airways through microscopic examination of cellular specimens. Despite its clinical relevance, computational studies in this domain remain limited, and the impact of nuclei localization quality on downstream classification performance has not been systematically investigated. In clinical practice, nuclei must first be localized before morphological interpretation can occur, introducing a sequential dependency between detection and classification. In this work, we introduce and evaluate a detection-driven pipeline for instance-level nuclei classification in nasal cytology using the Nasal Mucosa Cell Dataset (NMCD), the first publicly available dataset providing structured nucleus-level annotations in this field. The proposed pipeline mirrors the sequential clinical workflow by explicitly separating nuclei localization and classification, enabling a controlled analysis of the interaction between these two stages. To investigate the role of localization quality, we compare classification performance under two complementary configurations: an ideal scenario based on reference nucleus annotations and a detection-driven scenario operating exclusively on automatically localized nuclei. Experimental results show that end-to-end classification accuracy decreases by 7.7\% under detection-driven conditions compared to ideal localization, highlighting the impact of localization errors on downstream morphological classification.

12:10
GTDiagnosis: Intelligent Pathological Diagnosis of Gestational Trophoblastic Diseases via Visual-Language Deep Learning Model

ABSTRACT. The pathological diagnosis of gestational trophoblastic disease(GTD) takes a long time, relies heavily on the experience of pathologists, and the consistency of initial diagnosis is low, which seriously threatens maternal health and reproductive outcomes. We developed an expert model for GTD pathological diagnosis, named GTDoctor. GTDoctor employed our innovative multi-scale adaptive attention mechanism for pixel-level lesion segmentation, and builds a decision model through structured and unstructured feature extraction, combined with a large language model to provide personalized pathological analysis results. We developed a software system, GTDiagnosis, based on this technology and conducted clinical trials. The retrospective results demonstrated that GTDiagnosis achieved a mean precision of over 0.91 for lesion detection in pathological slides. In prospective studies, pathologists using GTDiagnosis attained a Positive Predictive Value of 95.59%. The tool reduced average diagnostic time from 56 to 16 seconds per case. GTDoctor and GTDiagnosis offer a novel solution for GTD pathological diagnosis, enhancing diagnostic performance and efficiency while maintaining clinical interpretability.

12:20
Performance and Feature Analysis in Cataract Detection Using Artificial Intelligence Models

ABSTRACT. Cataract remains a leading cause of visual im- pairment worldwide, motivating the development of automated screening tools based on retinal fundus photography. This study examines how explicit feature extraction affects artificial intel- ligence performance for binary cataract classification (normal vs. cataract) and discusses the resulting implications for inter- pretability and explainability. We compare end-to-end deep con- volutional models (ResNet-50 and an InceptionV3+ResNet50V2 configuration) with a traditional machine-learning baseline (sup- port vector machine, SVM) under three representation settings: raw images (no feature extraction), local binary patterns (LBP), and histogram of oriented gradients (HOG). Model performance is evaluated using accuracy, recall, F1-score, and precision–recall area under the curve (PR-AUC). Overall, end-to-end deep learning achieved the strongest dis- crimination without explicit feature engineering; notably, ResNet- 50 operating directly on raw images attained the best F1-score (86.32±0.94%). In contrast, classical SVM benefited substantially from engineered descriptors, improving from poor performance on raw pixels to an F1-score of 80.45 ± 2.18% when combined with HOG. These findings indicate that feature extraction can enhance classical pipelines and provide intrinsically interpretable representations, whereas deep networks largely benefit from learning task-specific features directly from fundus images. The results highlight a practical performance–interpretability trade-off relevant to cataract screening systems, particularly in resource-constrained deployment scenarios.

11:00-12:30 Session 15C: Big Data Predictive Analytics
Location: Atrium B
11:00
Deep Learning–Based Hospital Admission Prediction from Spanish Psychiatric Electronic Health Records

ABSTRACT. The automatic identification of patients requiring psychiatric hospitalization is critical for ensuring clinical safety and optimizing resource allocation in emergency mental health services. This study presents a deep learning–based approach for binary hospitalization prediction from unstructured Spanish psychiatric clinical notes. A dataset of 500 emergency psychiatric evaluations collected from CAULE was curated and anonymized, resulting in 409 validated records after preprocessing and outcome filtering. Several Transformer-based architectures were evaluated, including general-purpose English models (BERT), Spanish-specific pretrained models (BETO), and a clinically oriented English model (ClinicalBERT). Models were fine-tuned under stratified 70–15–15 splits and further optimized through systematic hyperparameter search. Results show that Spanish-pretrained models achieve the best overall performance, with BETO-cased reaching 0.952 in both Accuracy and F1-score on the independent test set. Hyperparameter optimization significantly improves performance across architectures, particularly for language-aligned models. Findings highlight the importance of linguistic compatibility and optimized training configurations when applying Transformer models to psychiatric clinical text. This work contributes to advancing clinical NLP research in Spanish and supports the development of AI-driven decision-support tools for mental health care.

11:15
Unified Vascular Score: A Data-Driven Framework for Personalized Cardiovascular Risk Stratification Using Arterial Stiffness

ABSTRACT. Traditional risk scores (TRS) such as Framingham, WHO, and Globorisk guide preventive decision-making by predicting future cardiovascular events using conventional risk factors. However, their reliance on population-based models limits their ability to capture underlying subclinical vascular damage, thereby restricting granular risk differentiation and effective personalized prevention. To address this limitation, we propose a Unified Vascular Score (UVS) that integrates arterial stiffness-based vascular health markers. In a study of 205 participants, we have validated UVS against TRS, and investigated its ability to further capture vascular heterogeneity across age–hypertension phenotypes within and across TRS risk strata. Inter-score comparison showed moderate agreement among TRS, with Globorisk aligning most closely with the consensus stratification (87%). When compared with TRS frameworks, UVS demonstrated a consistent monotonic increase across low-to-high risk strata. While TRS grouped all four physiological subgroups (Young Normal, Young Hypertensive, Old Normal, Old Hypertensive) within the same low-risk category, the proposed UVS revealed clear separation among them. This preserved discrimination highlights underlying vascular heterogeneity and demonstrates the ability of UVS to capture biologically meaningful differences that remain masked within conventional categorical risk stratification. These findings suggest that UVS complements traditional risk models while providing finer resolution within each risk category, enabling improved detection of subclinical vascular differences among individuals with similar conventional risk profiles.

11:30
Multimodal ECG Abnormalities Classification Approach Based on Anamnesis Patient Data and Signal Integration

ABSTRACT. Automated interpretation of 12-lead electrocardiogram (ECG) images offers critical value as a clinical decision support tool, enabling rapid and accurate patient triage in fast-paced emergency environments. However, existing classification models often rely solely on visual data, ignoring the essential patient history utilized by human cardiologists. This paper proposes a novel multimodal deep learning architecture that integrates raw static ECG images with baseline cardiovascular risk factors (e.g., age, sex, blood pressure) to optimize diagnostic precision. the model employs a Contrastive Language-Image Pre-training (CLIP) backbone, utilizing free-text cardiologist reports as an auxiliary supervisory signal during training to extract complex morphological features without requiring manual annotations. During inference, patient clinical metadata generates an attention mask that dynamically scales the extracted visual embeddings. This early-fusion gating mechanism balances information across modalities, enabling the model to adjust its visual processing based on each patient’s risk profile. Evaluated on ten highly imbalanced cardiovascular abnormalities from the MIMIC-IV dataset using Focal Loss, the proposed model achieves a weighted average F1-score of 77.4%, overall AUC of 95.83%, Accuracy of 93.82%, Precision of 77.08% and Recall of 77.85%, establishing a highly competitive benchmark against current state-of-the-art multi-label classifiers. Additionally, the integration of clinical context significantly improved the predictive confidence for critical ischemic events, boosting the detection rate for Acute Myocardial Infarction (AMI) by over 61% compared to an image-only baseline. These results demonstrate that patient clinical context is an indispensable prior, effectively transitioning theoretical ECG classifiers into robust, safety-first automated triage tools.

11:45
High risk and preventable harm groups identified in clustering of older patients on features associated with adverse drug reactions

ABSTRACT. Background Adverse drug reactions (ADR) leading to hospitalization cause considerable physical and emotional harm to patients and have been estimated to cost the UK NHS £2.21Bn per year. Structured Medication Reviews (SMRs) aim to prevent such harm through comprehensive review and revision of medications. Challenges remain in objective selection of patients for SMR, considering both risk of harm and potential for medicine optimization. We present subgroups of older patients with distinct characteristics and use of medications previously reported to associate with preventable ADRs. Methods Published ADR event codes were used to classify ADR hospitalizations in 634k electronic healthcare records from the CPRD AURUM dataset, filtered for patients defined as older (65+ years) on 01/04/2019. Time for ADR hospitalization was monitored from this date to 31/03/2020. Patients were split into training (90%) and testing (10%) partitions, stratified for equal proportions of ADR hospitalization. LASSO Cox regression extracted features associated with ADR hospitalization risk from 1,014 features describing medication, patient demographics and clinical characteristics. Finally, semi-supervised clustering was performed on extracted features to group patients on ADR risk. Results LASSO Cox regression extracted 74 features associated with ADR hospitalization, including scaled age (hazard ratio / HR: 2.48), alcohol liver disease (HR: 1.75) and unrecorded ethnicity (HR: 1.44). Patients clustered into two high ADR hospitalization older groups, two disease-specific groups and a healthy ageing group. Conclusion This work identified 5 patient subgroups with characteristic features and ADR risk. Future work will investigate scope for medicines optimization within groups, for prevention of ADR harm.

12:00
Subgroup Analysis for Risk of Fall Correlation Using the UK Biobank dataset

ABSTRACT. Falls are a major public health problem, with serious implications for the functionality and quality of life of adults. Although various machine learning approaches have been proposed for predicting fall risk, most are based on uniform models for heterogeneous populations, ignoring the substantial differences between disease categories. This study proposes a machine learning framework based on subgroup analysis to predict the risk of falls in different disease categories using data from the UK Biobank. Participants were grouped into clinically relevant disease categories, and multiple machine learning models were developed and evaluated for each subgroup. Model performance was evaluated using accuracy, sensitivity, specificity, ROC-AUC, and F1-score, while the calibration of probabilistic predictions was examined using the Brier score. In addition, explainable artificial intelligence techniques were applied through SHAP to interpret predictions. These results indicate that the performance of the models differs greatly across the disease types, resulting in moderate to high values of ROC-AUC, up to a maximum of 0.95 in some subgroups. Overall health was identified as the most important factor in most subgroups, whereas the importance of the activity factor was higher in subgroups of hematological diseases. In summary, these findings highlight the potential of subgroup-based, interpretable machine learning models to support more personalized and clinically actionable fall risk assessment in populations with chronic diseases.

12:15
Correlation-Based Validation of Multidomain Geriatric Assessment Tools for Older Adults Monitoring

ABSTRACT. Multidomain geriatric assessment batteries are increasingly integrated into digital health platforms for older adults monitoring, yet their internal consistency and cross-domain behavior under real-world conditions remain insufficiently examined. This study presents a correlation-based evaluation of a multidomain geriatric assessment dataset collected at two time points from 616 community-dwelling older adults (i.e., above 65 years old) in Greece. Pairwise Pearson correlation analysis examined test-retest stability, construct-level associations, and cross-domain coherence across balance, frailty, mental health, sleep quality, quality of life, and lifestyle measures. Strong correlations (r > 0.9) were observed for test-retest measurements, anthropometric clusters, and frailty-balance-fear constructs, while moderate correlations (0.4 < r < 0.7) reproduced known relationships between depression, sleep quality, functional independence, and quality of life. Composite multidomain screening subscores exhibited weak internal coherence and limited test-retest stability, highlighting implementation-sensitive limitations. These findings support correlation analysis as a transparent validation step prior to integrating geriatric assessment batteries into digital health and decision-support platforms.

11:00-12:30 Session 15D: Privacy, AI and Digital Health Platforms
Location: Atrium C
11:00
IndoClinNER: Overcoming Medical Prior Bias in Clinical De-identification via Adversarial Surname Injection

ABSTRACT. Nursing shift handovers automated via Retrieval-Augmented Generation (RAG) promise significant reductions in administrative burden. Our prior work demonstrated a local-first RAG framework deployable on consumer CPU hardware, achieving a 43.2% reduction in handover time while maintaining zero patient identifier leakage through deterministic regex privacy controls. However, regex-based de-identification triggers false positives when common Bengali and Hindi names (Joy, Deep, Anal) overlap with English vocabulary and medical terminology, risking desensitization to genuine privacy warnings over time—a precursor to alert fatigue. Conversely, Western-trained Named Entity Recognition (NER) models exhibit what we term Medical Prior Bias, systematically failing to detect these homonymous names in clinical contexts. We present IndoClinNER, a hybrid privacy architecture combining deterministic regex, contextual NER, and Adversarial Surname Injection (ASI)—a novel inference-time technique that exploits learned bigram dependencies by synthetically injecting surnames to force syntactic disambiguation. To address the scarcity of annotated code-mixed clinical data, we developed a Dual-Path augmentation strategy: injecting realistic Bengali names—constructed from 438 unique first names and 86 surnames extracted from West Bengal voter lists—into authentic MIMIC-III nursing notes via a constrained LLM pipeline. On 450 synthetic adversarial sentences across three independent runs, ASI achieved 99.44% recall with 99.44% precision. On 97 expert-generated clinical notes, the system achieved 84.4% recall, with failure analysis confirming that errors occurred predominantly in telegraphic syntax lacking grammatical markers. The system operates at 20 ms mean inference latency on a 33M-parameter model running on consumer CPU hardware, suitable for resource-constrained settings without GPU infrastructure.

11:10
PrivFusion: A Privacy-preserving Multi-Agent Framework for Harmonizing Distributed Datasets

ABSTRACT. The growing availability of clinical data has increased the use of machine learning, yet centralized data aggregation is often infeasible for sensitive health information. Federated Learning (FL) offers a distributed alternative, but its adoption is limited by substantial heterogeneity across institutional datasets, making harmonization a critical but frequently overlooked prerequisite for multi-site analytics. We introduce PrivFusion, a privacy-preserving multi-agent framework that automates the harmonization of structured datasets prior to federated training. PrivFusion uses agents to analyze local data, cluster semantically similar features across sites, and provides iterative transformation recommendations until alignment is achieved. Evaluation across four heterogeneous COVID-19 datasets demonstrates that PrivFusion effectively and efficiently harmonizes multi-site data while substantially reducing manual effort.

11:20
Vulnerability Audits for Connected Medical Devices

ABSTRACT. Connected medical and wellness devices increasingly act as front-ends for sensitive physiological data and, in some cases, as inputs that may influence health-related decisions. Their typical architecture, comprising embedded sensors, a companion mobile application, and cloud services, expands the attack surface beyond the device itself and makes security failures can become clinically relevant through privacy loss, data integrity issues, and reduced availability. This paper presents a reproducible black-box audit of nine commercially available connected health devices (glucometers, a blood pressure monitor, thermometers, a pulse oximeter, and wearables). We apply a structured evaluation framework derived from ETSI consumer IoT baseline requirements and conformance assessment, adapted to the connected-health context by emphasising data protection, secure onboarding, and update trust. The audit covers 31 test cases organised into eight operational vectors (network exposure, firmware/OS, update mechanisms, communications security, configuration portals, mobile app security, authentication/account security, and physical/auxiliary surface), and uses conservative evidence criteria (no exploitation, minimal reproducible indicators) to support repeatability and responsible disclosure. Across 150 applicable test outcomes, 58.0\% were favourable, 23.3\% were unfavourable, and 18.7\% were inconclusive due to limited technical transparency or insufficient artefacts. We conclude with actionable recommendations for manufacturers and procurement, focused on secure-by-design onboarding, verifiable update pipelines, and measurable privacy controls.

11:30
Informed, Empowered, and Heard: AI and Retrieval-Augmented Generation as Tools for Equitable Patient Information in Oncology

ABSTRACT. Cancer diagnosis confronts patients with an immediate and urgent need for reliable, comprehensible, and personalised information. Yet across Europe, and acutely in Greece, access to high-quality oncology information remains profoundly unequal, shaped by geography, language, socioeconomic status, digital literacy, and the time constraints of overstretched healthcare systems. This paper examines how artificial intelligence (AI), and specifically Retrieval-Augmented Generation (RAG) architectures, can address this inequity by delivering dynamically personalised, evidence-grounded patient information at scale - built by patients and for patient needs, not for hospital workflows. Beyond the technical dimension, the paper argues that the true transformative potential of AI in oncology communication lies not in replacing the human relationship between patients, informal carers, and medical professionals, but in strengthening it to create synergies of collaboration. When patients arrive at clinical encounters better informed, their capacity for shared decision-making is enhanced, their autonomy is respected, and the therapeutic alliance is deepened. The paper additionally draws on neuroscientific evidence on the cognitive burden imposed by cancer diagnosis, arguing that information systems must be designed for the compromised cognitive state of their recipients, not for the idealised attentive reader. Drawing on the Greek oncology context as a specific case within the broader European landscape, and grounding the analysis in established frameworks for patient rights, health literacy, and co-design, this paper presents a principled framework for deploying AI-RAG systems in cancer care in ways that prioritise equity, empowerment, and trust.

11:40
SlideLTI: A lightweight tool for LMS-integrated virtual microscopy

ABSTRACT. Histology is an integral element of medical education. While traditionally this subject has employed optical microscopes, a shift towards virtual microscopes (VM) has been observed over the last 2 decades. The integration of VMs in Learning Management Systems (LMS) has been realized through different approaches, yet technical overhead, technological obsolescence, commercial dependencies or pedagogical disadvantages are some of the challenges institutions, teachers and students might face. In this paper, we present SlideLTI, a lightweight tool for virtual microscopy that can be easily integrated in LMS via the LTI (Learning Tools Interoperability) technical standard. The proposed solution comes with minimal technical overhead, ideal for institutions with limited IT support. The solution is available as free and open source (FOSS) software.

12:30-14:00Lunch Break
15:30-16:00Coffee Break
16:00-17:30 Session 18A: AI Frameworks for Clinical Decision Support
Location: Panorama
16:00
Transformative Approaches in AI and Machine Learning for Personalized Healthcare: From Diffusion Models to Patient-Specific Therapies

ABSTRACT. Abstract— The field of medicine is transitioning from a "one-size-fits-all" model to patient-specific therapies, a revolution enabled by advanced AI and machine learning models that can analyze vast, multi-modal healthcare data. This article explores transformative AI approaches, with a specific focus on diffusion models and their role in creating personalized healthcare solutions. Traditional models like GANs and VAEs face challenges such as training instability, mode collapse, and blurry output. Diffusion models overcome these limitations by generating high-fidelity, diverse, and realistic data through an iterative denoising process, making them exceptionally well-suited for applications in medicine. Their capabilities are being applied to generate realistic synthetic biomedical data for rare diseases, design novel molecular structures for personalized drugs, and enhance medical imaging for improved diagnostics. These innovations facilitate the creation of "digital twins" or virtual patient simulations to test treatments in a risk-free environment, accelerate drug discovery pipelines, and optimize personalized dose regimens with reinforcement learning. Despite their potential, the widespread adoption of these technologies faces significant challenges. The "black box" nature of complex generative models hinders clinical trust and transparency. Synthetic data, while promising for privacy, can inadvertently memorize sensitive patient information, raising critical privacy concerns. Furthermore, the rapid evolution of adaptive AI models poses a regulatory hurdle for bodies like the FDA, which are working on new frameworks to keep pace. The future of AI in healthcare will focus on integrating these models into real-time clinical workflows, creating hybrid AI systems, and establishing global federated networks for continuous learning and equitable access to care.

16:15
SAFER-Bench: A Comprehensive Benchmarking Framework for Evaluating Medical Federated Retrieval-Augmented Generation (RAG) Systems

ABSTRACT. Retrieval-Augmented Generation (RAG) systems are crucial for building reliable AI applications in privacy-sensitive domains such as healthcare. However, existing RAG benchmarks assume centralized data access, while federated learning benchmarks focus primarily on model training rather than retrieval-based workflows. We introduce SAFER-Bench, a comprehensive framework for evaluating Federated RAG systems with approval-based privacy controls in medical settings. SAFER-Bench enables systematic evaluation of retrieval algorithms, merging strategies, and language models under realistic federated constraints where Data Owners maintain complete control over their private medical corpora. We evaluate the framework across five federation configurations (1-4 Data Owners) using five language models spanning three size categories: SmolLM2-1.7B-Instruct, BioMistral-7B and Mistral-7B-Instruct-v0.3, and Llama-3.3-70B-Instruct and OpenBioLLM-Llama3-70B. Our experiments on 200 medical questions from PubMedQA and BioASQ reveal that domain specialization, not model size, determines federation benefits: medical-specialized models benefit from federation at both 7B (+3--5%) and 70B (+5.2%) scales, while general-purpose models consistently perform worse with federation regardless of size (7B: -1 to -2.5%, 70B: -2 to -6%). These findings provide practical, evidence-based guidance for deploying privacy-preserving RAG systems in collaborative healthcare networks.

16:30
TIME-HF : Transformer for Integrated Medical Embeddings in EHR-based Heart Failure Prediction

ABSTRACT. Heart Failure (HF) is a complex clinical condition affecting over 64 million people worldwide (Chen et al.,2025). The prevalence of HF is increasing and it continues to be a major cause of unplanned hospitalisations (Savarese et al.,2023). Despite advances in clinical interventions to anticipate and improve post-diagnosis outcomes, the potential incidence of HF has been particularly challenging to predict before onset. Over the last few years, electronic health records have been increasingly used for HF screening. While reporting promising evaluation metrics, such screening tools are rarely implemented in routine clinical practice because they suffer from significant limitations, e.g.: i. reliance on both primary and secondary care data where linkage is non-trivial, ii. patient representations using simplistic, summarised features over the health timeline, iii. raw diagnosis, procedure, medication codes lacking clinical context used as features and iv. patient visits being weighed identically in model design. In addition, the non- uniformity of temporal gaps between consecutive patient visits is often ignored. Here, we develop TIME-HF, a transformer- based model on semantically rich patient representations to predict incident-HF 6 months ahead of time. Empirical results demonstrate remarkable performance using predictors only from primary care. Our method trained on 18k patients, shows high sample efficiency reaching AUC, precision and recall of 0.82, 0.73 and 0.74 respectively. TIME-HF outperforms traditional methods that use statistical summaries of patient records. Our work serves as a proof-of-concept emphasizing the importance of incorporating temporal features for HF-prediction and evidenc- ing the richness and adequacy of primary care data for early identification of at-risk individuals.

16:45
Reinforcement Learning Framework for Patient Allocation Between Complex and Regional Oncology Centres in Czechia

ABSTRACT. Managing stochastic oncological demand in the Czech Republic challenges planning for Complex Oncology Centres (COC) and Regional Oncology Centres (ROC). Herein, we present a reinforcement-learning framework that learns dynamic patient-allocation policies to maximize treatment efficiency while respecting capacity and other relevant operation constraints. A discrete-time simulator captured key operational features of lung cancer oncology care, including daily arrivals, multi-day treatments, patient deterioration, and COC capacity limits. The agent observed a five-dimensional state vector, i.e. severity, waiting time, length of stay, current COC occupancy, total assigned patients, and outputted probabilities for assigning each patient to COC versus ROC. Across several configurations, the deep Q-network (DQN) policy aligned closely with desired COC capacity, maintained the buffer constraint, and achieved higher cumulative reward than non‑learning baseline policies. The framework might act as a lightweight digital twin, supporting scenario planning, adaptive allocation, and pilot deployment on real data.

17:00
ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation

ABSTRACT. Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the ``black-box'' nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward ``white-box'' cardiac diagnostics.

17:15
Time-Domain GAN Compression of Intracranial EEG with Latent Quantization

ABSTRACT. Intracranial electroencephalographic (iEEG) signals are typically captured using matrices with a high density of electrodes. This makes compressing these signals a valid objective for reducing memory usage in storage and transmission. Among the available compression methods, generative adversarial networks (GANs) have not been used. GANs are primarily associated with image generation and image-related applications. They can capture the probability distribution of variables in a dataset. Compression algorithms have minimally explored this characteristic, especially when applied to time series. We present an algorithm for multichannel signal compression based on a GAN, and we prove its efficacy on a dataset of intracranial electroencephalography (iEEG) signals. The compression method modifies the Backpropagation GAN (BPGAN) algorithm to work with signals instead of images. To get around the difficulties introduced by the peculiarities of the specific application, we propose an algorithm that separates channels of the iEEG signal into groups with high correlation and synchronization. The algorithm compresses groups separately by selecting a latent representation for each signal window among a finite set of possible representations. To validate the results, we decided to compare the results of an epilepsy seizure recognition experiment on the original and reconstructed signal. This approach presents encouraging results regarding compression ratio despite the introduction of some artifacts in the reconstruction. This could open for future improvements in applying GANs in compressing signals by increasing the reconstruction quality while keeping the high compression ratio.

16:00-17:30 Session 18B: Ultrasound Imaging and Neonatal Respiratory Analysis
Location: Atrium A
16:00
A Calibrated Quantitative Framework for Diaphragmatic Analysis: Clinical Validation on Real B-Mode Neonates Ultrasound Videos

ABSTRACT. Accurate assessment of the diaphragmatic function is critical in neonatal intensive care unit (ICU), where respiratory failure remains a major cause of morbidity. Ultrasound video may provide a safe bedside modality for diaphragmatic evaluation and follow up. However, current clinical practice relies predominantly on manual (M) measurements that are highly operator-dependent and lack standardized quantitative workflows. In this study, we propose and clinically validate a calibrated computational framework for the quantitative analysis of neonatal diaphragm ultrasound acquired in the routine clinical practice. The proposed framework integrates ultrasound video selection, spatial resolution standardization, intensity normalization, M-mode generation, respiratory phases identification, and automated (A) extraction of clinically established evaluation metrics (EM). These include end-expiratory diaphragmatic thickness (DTE), end-inspiratory diaphragmatic thickness (DTI), diaphragmatic thickening fraction (DTF), and diaphragmatic excursion (DE). Validation was performed on 10 neonates ultrasound videos (N=10) using following statistical analysis: (i) agreement between M vs A measurements, and (ii) discrimination analysis between normal (NN = 5) vs abnormal (NA = 5) neonates cases. The proposed framework demonstrated mean absolute differences (MAD) of 0.055 mm and good statistical concordance with clinical reference measurements (at p<0.05). This is confirmed by the high intraclass correlation coefficient (ICC=0.98–0.99) between M vs A measurements reported for all EM investigated in this study. Furthermore, it was shown that above EM may be used to differentiate between NN vs NA cases. The results demonstrate that the proposed framework has the merit of enabling reproducible, and clinically meaningful quantitative diaphragmatic assessment, supporting the transition from qualitative visual interpretation towards standardized computational analysis in neonatal respiratory ICU care.

16:15
Semi-automated Diaphragmatic Motion Analysis in Neonates B-mode Ultrasound Videos

ABSTRACT. Abstract—This paper presents a semi-automated (SA) integrated system for analyzing and evaluating normal and abnormal diaphragmatic motion in ultrasound videos of neonates. The proposed system was tested on 10 ultrasound B-mode videos (5 normal (NN) and 5 abnormal (NAB)) of the neonate’s diaphragm and does not require dedicated ultrasound M-mode acquisition. A neonatal specialist provided manual (M) measurements from all videos for the motion parameters to enable comparisons with the SA results generated by the proposed system. The system initially starts with intensity normalization and spatial transformation. A deep learning (DL) U-Net segmentation model is used to identify the diaphragm region in each video frame. From the resulting segmentation masks, a diaphragmatic central axis is extracted to define sampling geometry. Based on this axis, 10 different M-Mode images are generated along 10 perpendicular sampling lines evenly spaced along the central axis. From each M-mode image a motion curve is extracted. From the extracted motion curves, M and SA evaluation metrics (EM) were calculated for all NN/NAB subjects investigated, as follows: Diaphragmatic excursion (DE), contraction velocity (CV), inspiration time (Tinsp), expiration time (Texp), total breathing time (Ttot), respiratory rate (RR), diaphragmatic slope curve (DSC) and relaxation rate (RRrelax). No statistically significant differences were found between the median(±IQR) M vs SA DE measurements for the NN/NAB subjects investigated (DEM= 2.99(0.97)mm / 2.03(0.62)mm, DESA= 3.56(1.46)mm / 2.36(1.42)mm, p= (0.285/0.001)). Statistical significant differences were found between the NN vs NAB motion measurements (DE= (3.56(1.46) / 2.36(1.42)) mm) as well as for other EM. Although the proposed system has been evaluated on a small number of subjects, it shows potential for future clinical application to assist clinicians in their clinical practice in the assessment of diaphragmatic motion in neonates.

16:30
A Semi-Automated Integrated System for the Segmentation of the Diaphragmatic Thickness in Ultrasound Videos of Neonates

ABSTRACT. Abstract—Assessing diaphragmatic function and diaphragmatic thickness (DT) in neonates is important for understanding respiratory effort, planning ventilator weaning, and monitoring chronic lung disease. However, DT measurements in routine practice are typically manual (M), time-consuming, operator dependent, and not always feasible in a neonatal intensive care unit (NICU), particularly when image quality is variable. In this work, we developed and evaluated an integrated semi-automated (SA) system for the segmentation and DT measurements from neonatal B-mode ultrasound videos. We analyzed 10 anonymized videos (5 normal (NN), 5 abnormal (NA)). M Micro-Dicom outside the zone of apposition (OUT-OF-ZOA) DT measurements were used as reference, since OUT-OF-ZOA and ZOA measurements are not interchangeable. For each video, the SA method produced 10 line-based estimates per evaluation metric (EM), each paired with a single M measurement. DT at end-expiration (DTexp), end-inspiration (DTinsp), and diaphragmatic thickening fraction (DTF) were computed. For OUT-OF-ZOA DTexp, M (NN/NA) was 1.52(0.55) / 1.58(0.50) [mm] and SA was 1.67(0.69) / 1.49(0.52) [mm] for DTinsp, M was 2.25(0.48) / 1.90(0.71) [mm] and SA was 2.42(0.57) / 1.60(0.86) [mm] and for DTF, M was 23(30.0) / 20.3(13.0) [%] and SA was 26.53(35.6) / 18.46(13.0) [%]. NN vs NA differences (M/SA) were not significantly different for DTexp (p=0.841/0.548) and DTF (p=0.421/0.548), with a trend for DTinsp (p=0.095/0.095), paired M vs SA p-values were 0.625 (DTexp), 0.556 (DTinsp), and 0.625 (DTF). Spearman correlation coefficient was high overall (ρ(NN)=0.900 across all EM investigated, ρ(NA)=0.821–0.900 for DTexp/DTinsp and 0.700 for DTF). APE% was moderate for DTexp/DTinsp and higher for DTF. Bland–Altman analysis further assessed agreement. Despite the limited dataset, the results suggest that the SA neonatal DT analysis is feasible and may reduce clinician’s workload while improving measurement standardization in the NICU.

16:45
Diaphragmatic Normal and Abnormal Motion Analysis in Real Neonates Ultrasound Videos

ABSTRACT. Accurate functional ultrasound video evaluation of the neonatal diaphragmatic motion is essential for effective respiratory management. However, standard diagnostic assessments predominantly depend on subjective manual measurements (MM), which hinders reproducibility and clinical scalability. To address these limitations, this study presents a semi-automated (SA) integrated system optimized for the quantitative assessment of diaphragmatic motion using real- clinical ultrasound videos of neonates. The proposed system enables the SA extraction of a comprehensive suite of respiratory parameters, including diaphragmatic excursion (DE), contraction and relaxation velocities (CV, RV), respiratory rate (Rr), inspiratory, expiratory and total respiratory times (Tinsp, Texp, Ttot). For validation, the system was tested on a clinical cohort of N=10 cases, comprising 5 normal (NN) and 5 abnormal (NA) cases. The SA measurements were rigorously compared against ground-truth MM provided by an experienced operator. The comparative analysis revealed substantial agreement between MM vs SA measurements. Specifically, no statistically significant differences were observed between the two measurement groups (p>0.05), which was further corroborated by a strong Spearman correlation coefficient (ρ=0.50–0.85, p > 0.05), a low mean absolute relative error (MARE=0.04-0.32), and a small standard error of mean (SEM=2.44-7.27). Crucially, unlike previous simulation studies, the present analysis on real clinical data demonstrated that statistically significant differences (p=0.01) existed across all evaluated diaphragmatic parameters—both kinematic and temporal—when comparing NN and NA groups. Consequently, the proposed system can effectively distinguish neonates with abnormal respiratory pattern using motion parameters extracted from diaphragmatic videos. These findings indicate that the proposed system could be used in clinical practice as a robust, objective, and scalable auxiliary tool for the bedside for neonatal respiratory assessment and follow-up. However, further application on a larger multicenter population is required for further validation.

17:00
Diaphragmatic Normal and Abnormal Motion Analysis in Simulated Neonates Ultrasound Videos

ABSTRACT. The diaphragm serves as a principal component of neonatal respiratory mechanics, and objective assessment of its functional impairment is crucial for optimal clinical management. However, current evaluation methods largely rely on manual and subjective measurements, limiting reproducibility and scalability in clinical practice. In this study, we developed and validated an integrated semi-automated analysis system capable of normal and abnormal simulated video generation and quantitative evaluation of the neonatal diaphragmatic function. The proposed system supports automated extraction of different quantitative diaphragmatic parameters used for clinical assessment, including diaphragmatic excursion (DE), inspiratory and expiratory timings (Tinsp, Texp), total time (Ttot), relaxation velocity (RV), respiratory rate (Rr), and contraction velocity (CV) during the breathing cycle. Furthermore, it facilitates comparative generation and analysis between normal and abnormal diaphragmatic motions. We generated normal (Nn=10) and abnormal (Na=10) videos of neonates using field II software. An experienced doctor calculated manual diaphragmatic measurements for all videos. While automated measurements were performed by the proposed system. Results demonstrated strong concordance between expert manual and automated measurements for both Nn and Na cases and enabled the objective identification of clinically significant differences. Specifically, manual and semi-automated measurements showed no statistically significant differences across all evaluated parameters with high Spearman correlations, low mean absolute relative error (MARE) and low standard error of mean (SEM) for all Nn and Na. Yet statistically significant differences were found for the CV, DE and RV at inspiration and expiration timings for the Nn vs Na videos. Therefore, CV and DE and RV may be used to discriminate Nn from Na simulated videos. While the system shows great promise for clinical and educational applications, further validation on a larger, real datasets is warranted. The presented approach provides a robust, scalable auxiliary tool for neonatal diaphragmatic assessment.

17:15
Automated Right Ventricular Segmentation from Apical Four-Chamber Echocardiographic Videos

ABSTRACT. Right ventricular (RV) systolic function is a key indicator of cardiovascular status, particularly in critically ill patients. However, quantitative RV assessment in echocardiography remains challenging due to the complex geometry of the right ventricle and variability in image quality. Automated segmentation methods could improve the reproducibility and efficiency of RV evaluation. In this study, we propose a deep learning framework for automated segmentation of the RV cavity at end-diastole (ED) and end-systole (ES) from apical four-chamber echocardiographic videos. The model is trained on the EchoNet-Dynamic dataset, originally developed for left ventricular analysis, and adapted to the RV domain using transfer learning and expert-annotated ED/ES frames. Using a DeepLabV3 architecture with a ResNet-50 backbone, the model achieved a mean Dice score of 0.8467 on the EchoNet- Dynamic dataset and 0.804 on an independent, private ICU clinical dataset, demonstrating consistent performance. Overall, the results indicate that transfer learning from large echocardiographic datasets can support reliable automated RV segmentation and provide a basis for future automated estimation of clinically relevant RV functional parameters.

16:00-17:30 Session 18C: Clinical Data Intelligence
Location: Atrium B
16:00
A Data-Driven Approach to Support Clinical Renal Replacement Therapy

ABSTRACT. Objective. The study aimed to develop and evaluate the viability of a data driven approach to predict membrane fouling in critically ill patients undergoing CRRT using machine learning algorithms. Moreover, Counterfactual Analysis is used to detect counterfactuals, i.e. the minimal modifications to the input of the machine learning model required to revert a membrane fouling prediction. Design. The study utilizes time series from Careggi University Hospital ICU. A subset of 16 specific features was recognized, following the recommendations of clinicians, as the most relevant indicators to train machine learning models for predicting membrane fouling dynamics. To keep the approach simple, interpretable, and amenable to detect reliable counterfactuals, the study mainly focuses on a tabular data approach, not involving the time series’ interdependence inherent within each treatment. Since the number of membrane fouling cases is considerably smaller than the overall number of treatments, the ADASYN oversampling method was utilized as preprocessing step for a more equitable representation of the minority classes. A Shapley values based Counterfactual Analysis is applied to the best prediction model, in order to detect counterfactuals whose quality is measured through a proper score. Results. The specific methods adopted include Random Forest, XGBoost and LightGBM. For all these methods a rebalancing rate of 10% with respect to the majority class showed the most balance performance with a sensitivity of 0.776 and aspecificity of 0.963. The performance obtained by all methods showed to be robust with respect to different length forecasting horizons. The tabular data approach revealed not to be a limitation as it outperformed the Long-Short-Term-Memory Recurrent Neural Networks which intrinsically take into account temporal relationships. It is shown that by reducing the features to 5 via a feature selection method, we obtain more simple and interpretable models, without compromising too much the accuracy. The adopted Counterfactual Analysis method is able to detect counterfactuals which seems promising according to the considered quality score. Conclusions. The experimental study provides promising results concerning the adoption of data-driven machine learning methods to predict membrane fouling events during CRRT. The practical implications for clinicians and nurses managing CRRT are significant; additionally, the interpretability afforded by using a reduced set of features enhances the understanding on how the model arrives at their conclusions without sacrifying too much predictive power. The predictions of the model and the associated Counterfactual Analysis can inform therapeutic adjustments leading to more effective patient care while minimizing the risk of membrane fouling.

16:15
A Temporally Aligned MIMIC-IV ED Pipeline for Multi-Outcome Prediction

ABSTRACT. Clinical deterioration among emergency department (ED) patients continues to be a leading cause of preventable morbidity and mortality. Although machine learning models for deterioration prediction are becoming more prevalent, inconsistent dataset construction and insufficient temporal alignment limit reproducibility and introduce temporal leakage. We present a modular and extendable end-to-end pipeline for converting raw MIMIC-IV data into analysis-ready datasets with explicit prediction times, harmonized multi-source event logs, and configurable outcome labels for eight clinical event types. The pipeline creates a cohort of 424,385 adult ED visits (202,080 patients) and generates feature sets for 1-hour (18 features), 6-hours (56 features), and 24-hour (61 features) observation windows, with the option to incorporate MIMIC-IV-ECG electrocardiogram measurements. Validation using logistic regression and gradient-boosted machines with 5-fold stratified cross-validation for benchmark comparison with an established MIMIC-IV-ED cohort shows high concordance (κ = 0.977) and performance parity. All code and SQL transformations are publicly available, allowing reproducible investigation of ED deterioration.

16:25
Temporal Drift in Action: Evaluating Strategies to Detect and Repair Drift using Real-World Data on Stroke and Myocardial Infarctions

ABSTRACT. Clinical prediction models are increasingly embedded within healthcare systems, yet their performance can deteriorate over time due to temporal drift. Such drift may arise from changes in population characteristics, data recording practices, or underlying clinical relationships, and can lead to miscalibrated risk estimates that compromise patient safety.

In this study we compare multiple strategies to detect (and potentially repair) temporal drift including the monitoring of performance metrics, analysis of model residuals using statistical process control and Kullback–Leibler divergence, and assessment of input data stability through discrimination error. We apply these approaches to a real-world case study using data from Connected Bradford, evaluating a logistic regression model aligned with QRISK‑2 for predicting 10-year risk of heart attack or stroke. Model behaviour was assessed monthly over a seven year period (2008 to 2015), with recalibration triggered whenever predefined thresholds were exceeded.

Our findings show clear evidence of temporal drift, with degradation in calibration and increasing divergence in residual distributions over time. Approaches based on maintaining performance thresholds produced the most accurate and stable predictions, although methods using model residuals offered similar performance. Regular recalibration at fixed intervals demonstrated reasonable accuracy while offering operational advantages due to predictable resource requirements. Methods independent of model residuals, such as discrimination error, detected drift without requiring long term outcome data and may therefore be more viable in contexts with substantial delays before outcomes can be observed.

Overall, the results highlight the importance of systematic drift monitoring for clinical prediction models intended for deployment at scale. The software library released as part of this research provides practical tools for detecting and mitigating drift, with clear trade-offs between statistical performance, regulatory considerations, and real-world feasibility.

16:35
Transformer-Based Cardiovascular Event Prediction Using National Claims Data

ABSTRACT. Cardiovascular disease (CVD) remains a global health priority. While administrative claims data offer a longitudinal view of patient history, their high dimensionality and extreme class imbalance make traditional risk prediction difficult. This study evaluates transformer-based architectures (BERT, BioBERT, and ClinicalBERT) for one-year CVD event prediction using a nationwide health registry, the French National Health Data System (SNDS). Using a cohort of 10.7M individuals, we compared the performance of these models against a Random Forest (RF) baseline. While all models showed high accuracy (> 98%) due to low event prevalence (1.2%), domain-adapted transformers (BioBERT and ClinicalBERT) significantly outperformed RF in clinical utility, achieving an F1-score of 16.8% and a 15-fold increase in recall. These results show that although transformer models capture some longitudinal information, overall predictive performance is modest, likely due to the loss of information when structured claims data are converted into text format for transformer models.

16:45
Predicting Emergency Admissions Following Chemotherapy: A Workload Planning Approach

ABSTRACT. Patients undergoing cancer chemotherapy are often at high risk of unplanned hospital admissions, because of their disease or treatment. These admissions are often to a specialist unit with limited capacity, so it is of value in planning resource allocation to know when peaks and troughs in demand are likely to occur. In this study we have sought to produce a machine-learning based model able to assess patients starting chemotherapy and their risk of subsequent unplanned admission, with a view to producing a forecast of likely unplanned admission numbers in subsequent weeks. Our models performs as well as or better than previously reported models, with an AUROC of 0.82 for the best models, but when applied to the expected number of admissions in each week, there remains a substantial amount of unexplained variability in observed admission numbers beyond our predicted values.

16:55
Real-World Validation of a Predictive Model for Length of Stay in Geriatric Settings

ABSTRACT. Early identification of patients at risk of prolonged hospital length of stay (LOS) is crucial for optimizing resource allocation and improving care in geriatric settings. We previously developed an ensemble machine learning model to predict prolonged LOS using routinely collected clinical and care-intensity variables. In the present study, we performed a real-world validation of the model within the same institution. Validation was conducted on two macro-groups: (1) a temporally subsequent cohort meeting the original inclusion criteria and (2) a clinically distinct subgroup. Each macro-group was further stratified into three temporal windows (July–December 2023, full year 2024, January-June 2025). Model performance was evaluated across predefined clinical clusters using Accuracy, Positive Predictive Value (PPV), and Negative Predictive Value (NPV). Weighted Generalized Linear Regression models were applied to assess group-by-day interactions and temporal stability. Overall, Accuracy and NPV remained stable across most clusters and validation groups, with no significant interaction between day and group in the majority of analyses. In contrast, PPV demonstrated greater inter-group variability, with significant day-by-group interactions across clusters. Comparative analyses confirmed that differences in PPV were primarily attributable to variations in outcome prevalence and case-mix rather than systematic degradation of model performance. The model maintained robust predictive performance over time and across clinically distinct cohorts within the same institutional setting. While PPV was sensitive to contextual factors, overall accuracy and negative predictive capacity remained stable, supporting the model’s potential utility as a real-time decision support tool for identifying patients at risk of prolonged LOS in geriatric care.

17:05
Clustering Type 1 Diabetes Patients based on Short-term Glycemic Dynamics

ABSTRACT. The analysis of Continuous Glucose Monitoring (CGM) data using pattern-based methods has emerged as a promising approach for characterizing short-term glycemic dynamics in Type 1 Diabetes. However, in large real-world longitudinal cohorts, highly unequal data contribution across patients introduces significant bias during unsupervised temporal pattern extraction. To address this, we introduce a robust patientlevel subsampling strategy to balance data contribution while preserving patient variability. We apply this methodology to a large cohort of 643 individuals, encompassing over five million two-hour glucose windows. Dynamic Time Warping clustering on the balanced dataset yielded six reproducible temporal glucose patterns, while hierarchical clustering of patient profiles revealed four distinct structural variability phenotypes. These results indicate that mitigating contribution bias preserves the core morphology of temporal patterns while revealing the variations in patient profile distributions, providing a scalable and reliable framework for large-scale CGM data analysis.

17:15
Exploratory GRU-Based Temporal Modeling of Longitudinal Multimodal Signals for Early Dropout in Psychotherapy

ABSTRACT. High dropout rates in psychotherapy, particularly in depression, remain a major clinical challenge and are further exacerbated in vulnerable populations such as hikikomori youth. Machine learning has emerged as a promising strategy to support treatment monitoring; however, applications in Internet-delivered Cognitive Behavioural Therapy (ICBT), especially for early dropout warning, remain limited.

In this context, we propose a longitudinal, speech-driven early-warning framework to monitor dropout risk among hikikomori patients undergoing ICBT in a real-world clinical setting. Instead of treating dropout as a static classification endpoint, disengagement is modeled as a discrete-time forecasting problem, where binary supervision is used only to estimate longitudinal risk trajectories across sessions. Our framework integrates multimodal information—self-supervised speech representations, interpretable acoustic descriptors, and behavioral engagement indicators—within a GRU-based temporal architecture.

Results indicate that multimodal modeling enhances sensitivity to early disengagement signals and enables clinically meaningful early-warning indicators, achieving a positive-class recall of 0.73 with specificity of 0.50, and detecting risk up to 2 sessions before dropout. These findings highlight the potential of longitudinal speech-based monitoring to support proactive clinical intervention and improve retention in digital psychotherapy.