IEEE CBMS 2026: THE 39TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS
PROGRAM FOR WEDNESDAY, JUNE 3RD
Days:
next day
all days

View: session overviewtalk overview

09:00-10:30 Session 1A: Large Language Models and Conversational AI for Clinical Reasoning
Location: Panorama
09:00
Architecture, Evaluation Metrics, and Technical Feasibility of LLMs in Mental State Assessment: A Case Study in PTSD

ABSTRACT. Mental state evaluations for conditions such as anxiety, depression, and Post-Traumatic Stress Disorder (PTSD) are essential for effective treatment, yet traditional diagnostic methods often suffer from subjectivity and clinical bias. We propose 'MentState,' an architectural framework leveraging large language models (LLMs) to assess mental health severity from conversational data. This technology features a configurable prompt engine that supports single or conversational text while integrating specific clinical domain knowledge. MentState extracts qualitative mental biomarkers aligned with traditional diagnostic symptoms and enables model refinement through specialized mental health datasets. To evaluate the system, we defined performance metrics including compatibility with existing clinical scores, model self-confidence, and statistical reliability across multiple activations. We demonstrated feasibility using a cohort of 20 subjects exposed to traumatic events, analyzing conversational transcripts derived from real-life online interviews. Our evaluations were benchmarked against the IES-R self-report questionnaire. Results showed over 80% compatibility with the IES-R benchmark across various configurations. Furthermore, the model exhibited high reliability with a coefficient of variation in the range of 3%. This study pioneers a data-driven approach to enhance diagnostic precision and address limitations in clinical availability. By providing objective clinical decision support, MentState underscores the transformative potential of AI to improve patient outcomes and alleviate healthcare professional burnout.

09:15
Verbalized Uncertainty in Medical AI: Differential Diagnosis in Commercial LLMs

ABSTRACT. Large Language Models (LLMs) have revolutionized large-scale data processing in healthcare settings, including more efficient and readily available diagnostic models. Differential diagnoses are generated freely and introduced into the clinic by concerned patients. However, many biases are present with limited knowledge about the relationship between the model correctness and the prediction's associated confidence. The current study analyzed three differently purposed LLMs in light of this relationship and visualized the calibration of medical LLMs. Sex, age, and pathology-stratified analyses were also performed separately to evaluate possible biases. Our results indicate that calibration moves from overconfidence to underconfidence when medical LLMs are prompted for a top-5 of likely diagnoses instead of a single prediction. Moreover we found no biases for sex or age-groups, while a bias might exist for specific pathologies. We show that robust evaluation is key for trust in these medical LLMs and more information is required before clinical adoption.

09:30
AyurAssist: Bridging Ayurvedic and Biomedical Clinical Knowledge Through Terminology-Grounded LLM Reasoning

ABSTRACT. Ayurveda is one of the world's oldest systematized medical traditions, yet standard biomedical ontologies such as SNOMED CT and ICD lack Ayurvedic diagnostic constructs, creating barriers to interoperability and clinical decision support. We present AyurAssist, a clinical decision support system that bridges this terminological gap through a vocabulary-grounded pipeline: biomedical named entity recognition (scispaCy) extracts clinical entities from free-text patient narratives, UMLS normalizes them to SNOMED CT and ICD codes, and fuzzy matching against the full 3,550-term WHO International Terminology for Ayurveda (ITA) constructs a structured Ayurvedic context for a large language model (Qwen3-32B), which performs three-pass clinical reasoning. We validate the system through three complementary experiments. First, benchmarking on BhashaBench-Ayur (14,963 questions) establishes Qwen3-32B (54.2%) as the top-performing model, outperforming the domain-specific AyurParam-2.9B (40.0%). Second, an ablation study over 80 clinician-annotated vignettes demonstrates that the terminology bridge yields a statistically reliable improvement in treatment quality (term-level F1: 0.219 vs. 0.156; near-disjoint 95% bootstrap CIs) and a directionally consistent gain in diagnostic accuracy (80.0% vs. 75.0%), while the bridge alone achieves only 5.0%, confirming that the LLM performs clinical reasoning, while the bridge provides essential vocabulary grounding. Third, inter-rater reliability across four clinicians (two Ayurvedic, two allopathic) establishes ground-truth validity with substantial agreement for modern diagnosis (PABAK= 0.66) and moderate agreement for Ayurvedic diagnosis (PABAK= 0.56). The key insight is that domain-specific vocabulary grounding of a capable general-purpose LLM, rather than domain-specific model training, offers a practical and scalable path toward interoperable integrative medicine informatics.

09:45
Multimodal Dual-Encoder Retrieval for Automated ICD Coding

ABSTRACT. Accurate International Classification of Diseases (ICD) coding is crucial for large-scale clinical research, documentation, and billing. There are three primary problems with current ICD prediction methods: (1) They are unable to comprehend multimodal patient data because they rely on either structured EHR data or unstructured clinical notes. (2) They also struggle with scalability to a larger amount of ICD codes (9K+ codes in ICD-9), as traditional classifiers need dense output layers and often do not generalize well to long tail rare diseases. (3) They lack transparency for clinical use. To address these challenges, this research proposes a two-stage framework that first retrieves ICD codes using a multimodal dual-encoder retrieval model, where structured and unstructured patient data are integrated through gated fusion. The second stage refines the top-k retrieved candidates with an LLM-based re-ranker that provides ranked codes with clinically relevant explanations. Our experiments show that the proposed approach improves Micro-F1 and Precision over a multimodal dual-fusion classifier baseline. These improvements demonstrate that combining a gated multimodal retrieval system with LLM-based re-ranking is a practical alternative to dense multi-label classification for automated ICD coding.

10:00
A Conversational Agent for Natural Language Access to Public Health Data

ABSTRACT. DATASUS, Brazil's national public health data repository, provides access to large volumes of epidemiological data. Among its systems, the Hospital Information System in Reduced Data format (SIH-RD) records millions of hospitalization procedures. Despite being one of the world's largest epidemiological repositories, SIH-RD microdata remains analytically inaccessible to non-technical practitioners: opaque clinical encodings, ambiguous join paths, and coded value mappings confound general-purpose language models, and no Portuguese natural-language interface for DATASUS currently exists. To the best of our knowledge, we present the first Text-to-SQL system for DATASUS SIH-RD, enabling queries over 18.7 million hospitalization records. To address this, we derive fifteen domain-specific SQL generation rules from systematic SIH-RD schema analysis and embed them in a 9-stage LangGraph pipeline with query routing, automatic table selection, chain-of-thought planning, SQL generation, static validation, and bounded self-repair, requiring no model fine-tuning. We also construct a benchmark of 120 Portuguese healthcare queries stratified into Easy, Medium, and Hard tiers (40 each) with gold-standard SQL over records from two Brazilian states (2008-2023), comprising the first formal Text-to-SQL evaluation on SIH-RD. The agent achieves 93.3% execution accuracy (112/120) with 100% pipeline completion; a controlled single-shot baseline sharing identical model, domain rules, and prompts achieves 90.0% (108/120), with the advantage concentrated in Hard queries (+10.0 pp), isolating the contribution of graph orchestration for complex multi-table reasoning.

10:15
Enhancing Medical Question Answering in Open LLMs via Inference-Time Ensembles

ABSTRACT. Large Language Models (LLMs) have demonstrated strong performance on medical examination benchmarks, particularly when augmented with structured prompting and ensemble-based decoding strategies. Methods such as Chain-of-Thought reasoning, dynamic example retrieval, and self-consistency suggest that meaningful gains may be achievable without parameter updates. However, the extent to which these inference-time strategies enhance clinical reasoning in medium-scale open-weight models remains insufficiently investigated. To examine this question, this work evaluates a MedPrompt pipeline using the Qwen3-8B model on the MedMCQA dataset and compare it against a strict deterministic zero-shot configuration. The approach combines correctness-filtered Chain-of-Thought, k-Nearest Neighbors retrieval of semantically similar examples, and a meta-ensemble that aggregates predictions across temperature scaling and choice shuffling The resulting system improves accuracy from 58.6% to 65.6%, yielding a 7 percentage point absolute gain without fine-tuning. These gains arise primarily from three mechanisms within MedPrompt: filtering Chain-of-Thought demonstrations to include only correct reasoning traces, dynamically retrieving semantically similar examples to guide the model, and aggregating predictions across multiple decoding configurations. This combination enhances reasoning guidance, contextual relevance, and prediction stability without modifying model parameters. These findings support structured inference-time prompting as a reproducible mechanism for improving medical multiple-choice reasoning in open-weight LLMs.

09:00-10:30 Session 1B: Deep Learning Methods for Medical Imaging
Location: Atrium A
09:00
MicroClinic: An Ultra-Low-Parameter Neural Network for Medical Image Analysis

ABSTRACT. This paper introduces MicroClinic, an ultra-compact convolutional neural network designed for medical image analysis under extreme resource constraints. While state-of-the-art architectures typically rely on millions of parameters, MicroClinic operates in a regime of 0.4k to 1.1k trainable parameters, following a design philosophy where relational processing substitutes parameter redundancy. The architecture integrates lightweight convolutional blocks with a Convolutional Multi-Head Attention (CMHA) mechanism to capture global spatial dependencies without increasing network depth. Benchmarked across twelve independent clinical datasets, MicroClinic achieves competitive performance, reaching 99.9% accuracy on MedicalMNIST and 95.7% on COVID-19 X-Ray classification, effectively matching models up to 29,000 times larger in parameter count. Beyond efficiency, the reduced scale limits memorization (attain zero error) capacity, potentially improving data privacy, while also enabling structural analysis of the learned representations through metrics such as the Fisher discriminant ratio and mutual information. These results demonstrate that diagnostically meaningful accuracy can be achieved within a minimal parameter budget, enabling AI-assisted screening in underserved and latency-sensitive clinical environments or on hardware with limited computational resources.

09:15
Which Factors Influence Success of Unconstrained Interpolation for Augmentation in Multiple Instance Learning?

ABSTRACT. For classifying digital whole slide images in the absence of pixel-level annotations, multiple instance learning methods are applied. Since the number of samples is often low in this setting, data augmentation is important. Here, we investigate unconstrained (multi)linear interpolation between feature vectors, a data augmentation technique that has proven capable of improving the generalization performance of multiple instance learning models. Recent work has shown both high performance and high variability, but it remains unclear which factors influence this performance. To gain insights, we conducted a large study incorporating 9 different dataset configurations, two feature extraction approaches, stain normalization, and two multiple instance learning architectures. We identified consistent behavior when varying the feature extraction method, and the classification model. However, we observed a strong dependence on the underlying image data.

09:30
Multimodal Deep Learning for Tumor Site Classification: Integrating Histopathology and Gene Mutation Status

ABSTRACT. Accurate identification of a tumor site of origin is particularly consequential for cancers of unknown primary (CUP), since therapy selection depends on the inferred origin. Whole-slide histopathology images (WSIs) provide rich morphological cues but are known to suffer from acquisition-driven domain shift. Genomic alteration profiles provide complementary molecular evidence about tumor lineage and biology, though they can also vary with sample processing, coverage, and variant-calling choices. Since the two analyses reflect different aspects of tumor biology and are subject to distinct sources of variation, combining them can reduce reliance on any single, potentially biased signal. In this work, we present a multimodal primary site classifier that integrates WSI representations with a compact mutation-status profile derived from a 92-gene panel. Starting from the TOAD framework for tumor origin assessment, we modified the model to (i) focus exclusively on tumor-site classification, removing the task of discriminating metastatic and primary tumor, and (ii) incorporate a binary vector describing the mutation status of 92 genes. Experiments on matched histopathology–genomics cases from TCGA demonstrate a strong interaction between modality utility and resolution domain shift: for a test set composed of digital slides with out-of-distribution microns-per-pixel (mpp) with respect to the training set, the genomics-only model is substantially more robust than the WSI-only model (top-1 accuracy 0.51 vs. 0.27), whereas in-distribution mpp favors histopathology, and multimodal fusion yields the best performance (0.90 top-1 accuracy).

09:45
Automated Ki-67 Proliferation Index Estimation for Deep Learning Applications in Histopathology

ABSTRACT. Accurate assessment of the Ki-67 proliferation index is essential in histopathology, yet manual counting of positively stained nuclei in immunohistochemistry slides is time-consuming and subject to inter-observer variability. This paper presents an automated method for Ki-67 index estimation based on morphological image processing and evolutionary optimization. The approach integrates color-based preprocessing, morphological filtering, and distance transform–based cell separation to segment and quantify stained nuclei. A genetic algorithm is used to optimize key parameters to improve segmentation robustness across heterogeneous tissue samples. Experimental results indicate that parameter optimization enhances consistency compared to non-optimized configurations. The proposed method provides an automated alternative to manual assessment and supports label generation for deep learning applications in digital pathology.

10:00
Deep Learning Architectures for Automated Classification of Fetal Liver Echotexture in Gestational Diabetes

ABSTRACT. Gestational diabetes mellitus (GDM) promotes fetal hyperinsulinemia, leading to fat accumulation in the fetal liver detectable via routine B-mode ultrasound. No published work has applied deep learning to classify fetal liver echotexture for automated GDM-related metabolic assessment. We present a comparative study of six convolutional neural network architectures — ResNet-18, ResNet-34, ResNet-50, EfficientNet-B0, EfficientNet-B4, and EfficientNet-B7 — for binary classification of fetal abdominal circumference above the 75th percentile (CA\,>\,p75) from liver-only ultrasound images. A patient-stratified cohort of 232 patients (110 GDM, 122 controls) provided 1,733 matched liver-only images (from a full dataset of 2,047), with class imbalance addressed via minority oversampling and weighted focal loss. Models were selected by fbeta on validation and evaluated on a held-out test set using AUC, sensitivity, specificity, and F$_1$ with bootstrap 95\% confidence intervals. EfficientNet-B0 achieved the highest AUC of 0.618 (95\% CI: 0.537--0.694). All models achieved very high true sensitivity for the elevated-CA class (0.93--1.00) but very low true specificity (0.00--0.13), indicating over-prediction of the positive class driven by the combined oversampling and weighted focal loss strategy. These results establish the first baseline for automated deep-learning screening of fetal hepatic echotexture in gestational diabetes and motivate larger multi-centre validation.

10:15
A Novel Multi-Omics Driven Deep Learning Approach for Accurate Lung Cancer Detection

ABSTRACT. Lung adenocarcinoma (LUAD) remains a formidable challenge in clinical oncology, characterized by a complex molecular landscape that often renders traditional staging methods insufficient. This research introduces a sophisticated deep learning architecture engineered to integrate RNA sequencing (RNA-Seq), microRNA (miRNA) expression, and DNA methylation profiles into a single predictive framework. Our methodology emphasizes a stratified preprocessing pipeline, where each omics modality undergoes independent differential analysis before being synchronized through sample-wise alignment. The study utilizes data from The Cancer Genome Atlas (TCGA) repository to bridge the gap in computational synthesis of diverse molecular layers into a unified diagnostic signal.

09:00-10:30 Session 1C: ECG and Physiological Signal Processing
Location: Atrium B
09:00
Quality Over Quantity: The Impact of Diagnostic Certainty of Data in Deep Learning for ECG Analysis

ABSTRACT. Deep learning models for ECG based myocardial infarction detection often prioritize discriminative accuracy over probability calibration, overlooking how the diagnostic certainty of training data impacts clinical reliability. To address this, we trained a CNN-LSTM and a Bidirectional-Mamba-2 architecture on the PTB-XL dataset using three distinct cohorts: exclusively certain cases, all inclusive cases, and a mixed cohort matched for size. When evaluated against a standardized test set of exclusively certain cases, models trained on high certainty labels consistently yielded the most accurate calibration, achieving Expected Calibration Errors of 0.0184 and 0.0339 respectively. Conversely, substituting certain data with uncertain labels in the mixed cohort matched for size significantly increased calibration error and false positive rates. Furthermore, Subclass-aware Integrated Gradients analysis confirmed that models trained on certain data learned physiologically congruent lead importance patterns, demonstrating that label quality, rather than sheer data quantity, fundamentally drives both model trustworthiness and clinical interpretability in medical AI.

09:15
Safety-Oriented Interpretable ECG Denoising with Regression Tsetlin Machines

ABSTRACT. This paper proposes a safety-oriented and interpretable hybrid framework for ECG denoising that decouples artifact intensity estimation from waveform restoration. Three Regression Tsetlin Machines, trained with frequency-isolated features and hard negative mining, estimate normalized intensities of baseline wander, muscle artifact, and power line interference. Calibrated estimates modulate deterministic wavelet-based attenuation and adaptive notch filtering, enabling adaptive yet bounded suppression without direct waveform reconstruction. Evaluated on 1000 windows from 10 unseen patients under realistic mixed-noise conditions, the framework achieves a mean SNR gain of +8.50 dB (95% CI: [+7.90, +9.10]) and satisfies IEC 60601-2-25 amplitude tolerances in four of five noise categories. Integer-only inference requires 56 ms per 4-second window (71× real-time), while clause-level transparency supports feature auditing and physiological validation, enabling predictable behavior and suitability for safety-critical and embedded deployment.

09:30
Benchmarking Signal Reconstruction Pipelines Against Direct Visual Feature Extraction for Digitized ECGs

ABSTRACT. Although deep learning models demonstrate superior performance in interpreting 1D electrocardiogram (ECG) data, a vast number of clinical records are archived as static images, limiting the deployment of state-of-the-art models. This study investigates whether to address this by reconstructing the 1D signal from the image or by applying computer vision models directly to the image. Utilizing the PTB-XL dataset and ECG-Image-Kit, we benchmark a direct visual feature extraction approach (using EfficientNet-B0 and DINOv3) against a signal reconstruction pipeline (U-Net followed by a custom 1D ResNet). The models are evaluated on biological age regression, sex classification, and pathology detection. Results indicate that the 1D ResNet model operating on reconstructed signals consistently outperforms 2D vision-based models across all tasks, despite having significantly fewer parameters. For instance, the 1D model achieved an age regression mean absolute error (MAE) of 9.09 years compared to over 14.30 years for the 2D models. The findings suggest that 1D temporal representations of ECG data are more information-dense for diagnostics, and that targeted signal processing remains a more robust framework than direct image analysis using general-purpose foundation models.

09:45
Temporal Latent Priors for Sequential VAE Modeling of Full-Length 12-Lead ECGs

ABSTRACT. Variational autoencoders (VAEs) are increasingly used for electrocardiogram (ECG) representation learning, yet most prior work focuses on short segments or single-lead recordings and rarely evaluates reconstruction, classification, and latent-space behavior jointly on full 10-second 12-lead clinical ECGs. This paper presents a modular sequential VAE framework for PTB-XL that explicitly models time-indexed latent trajectories and compares independent versus temporally structured priors across encoder architectures and objective functions. We show that temporally structured latent priors significantly improve reconstruction fidelity and latent utilization without degrading multi-label diagnostic performance. Transformer encoders combined with InfoVAE-MMD objectives provide the best balance between reconstruction and representation quality. However, unconditional generation from the learned priors remains physiologically limited. The results highlight the importance of latent dynamics for long multilead ECG modeling and provide guidance for future generative cardiac models.

10:00
Beyond Single-Beat Classification: Quantifying Arrhythmia in Long-Term ECG via Prevalence Estimation

ABSTRACT. Long-term electrocardiogram (ECG) monitoring is essential for determining the arrhythmic burden, a critical clinical metric for diagnosing cardiovascular conditions. Traditionally, this burden is estimated using a Classify-and-Count (CC) approach, which labels individual heartbeats and aggregates results by counting predictions for each label. However, even state-of-the-art Deep Learning classifiers exhibit systematic biases that accumulate over long-term recordings, leading to significant diagnostic inaccuracies. This paper investigates the application of quantification techniques to estimate arrhythmia prevalence in long-term ECG signals from the MIT-BIH Arrhythmia Database. We compare several base classifiers paired with quantification algorithms against a high-performance Deep Learning baseline, LITETime, using the standard CC method. Our results demonstrate a quantification paradox: while the LITETime achieves superior beat-by-beat accuracy, simpler classifiers equipped with quantification adjustment layers, particularly the Expectation-Maximization Quantifier (EMQ), significantly reduce the Mean Absolute Error (MAE) in prevalence estimation. By correcting the systematic bias caused by Prior Probability Shifts, our framework provides a more reliable diagnostic tool for long-term monitoring and wearable cardiac devices.

10:15
Cuffless Continuous Arterial Blood Pressure Waveform Estimation From PPG Using a CNN–LSTM Encoder–Decoder

ABSTRACT. Blood pressure is a key parameter used to assess a patient’s condition. It is typically measured at predefined intervals after a patient is admitted to the hospital or as part of routine home monitoring. However, periodic measurements may be insufficient when a patient’s condition is critical or unstable. Continuous blood pressure monitoring is challenging and sometimes requires invasive methods, even though it is a crucial vital sign during many surgical procedures. While some cuffless blood pressure monitors can provide noninvasive, continuous measurements, they require additional equipment and charging, increasing the logistical burden for hospitals. In this study, we propose an algorithm that estimates continuous blood pressure from a photoplethysmography (PPG) signal, which can be easily acquired using a finger pulse oximeter and analyzed using machine learning methods. Our algorithm achieves mean absolute errors (MAE) of 6.44 and 2.31 for systolic and diastolic pressure, respectively.

09:00-10:30 Session 1D: Intelligent Wearables, Assistive Technologies
Location: Atrium C
09:00
Emotion-Aware Assistive System with Wearable Haptic Feedback for Visual Impairment

ABSTRACT. Visual impairment limits access to nonverbal social cues and facial expressions, negatively impacting social participation and psychological well-being. Assistive technologies offer a strategy to enable interpersonal interaction. This work presents the development and validation of an integrated biomedical assistive system that performs real-time facial emotion recognition (FER) and delivers vibrotactile feedback to visually impaired users.

The proposed platform combines a convolutional neural network enhanced with Convolutional Block Attention Modules (CBAM-4CNN) and a wearable Bluetooth Low Energy (BLE) haptic device. To improve clinical reliability, a data-centric optimization strategy was implemented on the AffectNet dataset, addressing severe class imbalance and label noise through manual inspection and automated Confident Learning. Model accuracy improved from 58.46% to 79.18%, highlighting the importance of data quality in biomedical AI applications. The FER model was deployed on a Raspberry Pi 5 for local inference, enabling real-time processing without cloud dependency. Emotion outputs are wirelessly transmitted to a custom nRF52840-based wearable module that encodes emotional states into distinct vibration patterns. Qualitative validation was conducted with rehabilitation visual impairment users, who confirmed the system's responsiveness and practical feasibility. The proposed solution demonstrates the potential of multimodal AI-driven assistive systems to support social inclusion and partial functional independence in individuals with visual impairments.

09:15
An Energy-Efficient Wearable System for AF Detection: LLM-NAS Driven Lightweight Neural Network and Embedded Deployment

ABSTRACT. Atrial fibrillation (AF) detection is pivotal for stroke prevention, yet the deployment of robust deep learning models on resource-constrained wearable devices remains a formidable challenge due to excessive computational demands. This paper presents an automated, hardware-aware design pipeline for ultra-lightweight AF classification, driven by Large Language Model-based Neural Architecture Search (LLM-NAS). By translating hardware constraints into structured linguistic priors, we leverage the reasoning capabilities of LLMs to discover an optimized convolutional neural network architecture that synergizes time-frequency dual-branch feature extraction, depthwise separable convolutions, and channel attention mechanisms. To further bridge the gap between algorithmic complexity and embedded efficiency, the discovered model undergoes a two-stage compression suite involving structured pruning and quantization-aware training. Experimental results on the CPSC2021 dataset demonstrate that the resulting model achieves a high F1-score of 0.9674 with only 7.93K parameters and a minimal memory footprint of 7.7 KB—a significant reduction compared to existing state-of-the-art models. Furthermore, we implemented a complete prototype system on an STM32F767 microcontroller, achieving a single-inference latency of 306.20 ms and an incremental power consumption of 0.18 W. This end-to-end validation confirms the feasibility of our LLM-driven methodology for real-time, high-fidelity AF monitoring in next-generation medical IoT devices.

09:30
Cervical Kinematic Recorder: a technological innovation for cephalogyric movement assessment

ABSTRACT. This paper presents the Cervical Kinematic Recorder (CKR) application, a tool designed to replicate and capture cervical movements performed in a clinical setting using a virtual reality environment. By extracting accelerometer signals, we developed a classification model capable of accurately distinguishing healthy individuals from patients with left or right cervical dysfunctions. The study also demonstrates how signal-processing features of Yaw, Pitch, and Roll can be used to characterize these patient categories, providing interpretable insights into the kinematic patterns associated with cervical impairments. This approach offers a promising avenue for objective assessment and monitoring of cervical function in clinical practice.

09:45
Deep Learning-enabled Monocular, Markerless 3D Pose Estimation of Laparoscopic Tooltips in Box-Trainer Simulators

ABSTRACT. Laparoscopic surgical training poses significant skill acquisition challenges, which can be overcome through automated skill assessments and objective feedback mechanisms. Such analysis fundamentally depends on accurate 3D pose estimation of instruments. Existing approaches rely on bulky hardware, additional markers, prior 3D models, or multi-view camera setups. In contrast, our work proposes and evaluates a deep learning-enabled modular pipeline for markerless 3D pose estimation of instruments from a single camera. In this paper we conduct a comprehensive evaluation of both the deep learning detection module and the pipeline's 3D pose estimation performance under unseen and challenging conditions representative of realistic simulator scenarios. The detection model achieves an mAP0.5-0.95 of 99.3% on the test set, and the pipeline demonstrates mean absolute errors (MAE) of 1.53 mm, 1.44 mm, and 1.50 mm along the X, Y and Z axes respectively. Our findings indicate that the proposed pipeline generalizes robustly to realistic simulator conditions, thereby advancing the feasibility of automated skill assessment and practical deployment in laparoscopic training environments, ultimately contributing to improved quality of surgical training and patient outcomes.

10:00
Knee Exercise Directional Control and Range of Motion Measurement Device for Physical Therapy Monitoring of Post-Total Knee Arthroplasty Patients

ABSTRACT. This paper presents the Smart Knee Rehab, a low- cost system for post-Total Knee Arthroplasty (TKA) rehabil- itation that combines a fully passive mechanical guide with markerless computer vision to measure knee range of motion (ROM) during flexion and extension exercises. The passive rail constrains motion to the sagittal plane to promote safer execution at home, while the camera-based algorithm provides quantitative ROM feedback without wearable sensors. In validation against an iPhone-based inclinometer reference (Measure app on iPhone 13 Pro Max; gyroscope), the system achieved a mean absolute error of 2.38◦ and a root mean square error of 4.09◦.

10:15
State-of-Flight: Wearable EEG and Motion Context During Flying or Floating

ABSTRACT. We introduce \emph{State-of-Flight}, a hypothesized transition state that may emerge during taxi, takeoff, cruise, and landing, where vestibular input, vibration, engine noise, and anticipatory arousal co-occur. We report an in-cabin field protocol using a Muse S/Athena-class wearable EEG headband (4 channels, 256\,Hz) with synchronized inertial measurements (IMU), together with a reproducible analysis pipeline based on 0.5--30\,Hz preprocessing, 5\,s epoching, and spectral feature extraction. Across a small field dataset (\(n=5\) flight participants, \(n=2\) control participants, along with a water-based float comparison dataset), pooled epoch-level analyses show elevated theta-beta ratios (TBR) in flight phases relative to controls, with broad variability consistent with real-world wearable recordings. Frontal alpha asymmetry (FAA) also varies across conditions, but is interpreted conservatively due to motion, fit variability, and sign-convention sensitivity. These results are preliminary and primarily demonstrate feasibility, instrumentation, and a reusable analysis scaffold for higher-powered future studies in aviation, neuroergonomics and motion-aware wearable sensing.

10:30-11:00Coffee Break
11:00-12:30 Session 3A: Explainable AI for Clinical Decision-Making
Location: Panorama
11:00
Interpretable Machine Learning for Early Sepsis Prediction

ABSTRACT. Sepsis disease causes a rising number of morbidity and mortality cases in Intensive Care Units (ICUs). Despite advances in diagnostic biomarkers and scoring systems, as recommended by the World Health Organisation, there is a strong need for early diagnosis and timely interventions. In this study, we leverage Electronic Health Records (EHRs) from the Medical Information Mart for Intensive Care (MIMIC)-IV dataset and propose a machine learning (ML) pipeline that supports explainability. Using data from 5,285 patients and 45 clinically relevant features, we develop an Ensemble of a Gradient Boosting and a Long-Short Term Memory Network to predict sepsis 12 hours in advance. The resulting Area Under the Curve (AUC) is 0.91 with sensitivity 0.91 at specificity 0.74. A Decision Curve Analysis shows strong clinical utility with maximum net benefit 0.45. We use the TE2Rules explainability library for interpretability, achieving an overall fidelity score of 94% with just 44 rules for a positive prediction and 51 rules for negative prediction. By further applying argumentation theory Ensemble sensitivity reaches 92%. This is the first study to investigate sepsis prediction in ICU patients using MIMIC-IV, achieving a significantly higher AUC and at longer lead time, and the first to use rule-based explainable AI and argumentation theory on this problem. The pipeline is proven promising for early sepsis diagnosis and intervention, ultimately reducing mortality rates and healthcare costs.

11:15
Explainable Complication Status Prediction after Tracheostomy Procedure

ABSTRACT. Tracheostomy is a common procedure in intensive care units (ICU) and is associated with substantial morbidity, mortality, and healthcare costs. Despite the high incidence of post-tracheostomy complications, current risk assessment relies largely on clinician judgment, with no standardized predictive tools. In this study, we develop and evaluate machine learning (ML) models to predict tracheostomy-related complications by hospital discharge using early post-procedural data. We used the Medical Information Mart for Intensive Care (MIMIC)-IV database and 16 unique demographic, diagnosis, and 12-hour post-tracheostomy physiological measurements from 581 adult ICU patients. We evaluated Random Forest, Extreme Gradient Boosting (XGB), K-Nearest Neighbours (KNN), and Multilayer Perceptron (MLP) models across 6 temporal data representations. Interpretability was supported with SHapley Additive exPlanations (SHAP). The XGB model using combined aggregated and flattened temporal features achieved the best performance (AUC 0.79, AUPRC 0.90, Brier score 0.19). Key predictors included ventilator-associated pneumonia, infection-related diagnoses, and peak airway pressure variability. Extending the temporal window to 24 hours did not improve performance. This work represents the first ML-based approach for predicting post-tracheostomy complications for clinical decision support, improved patient experience, and lower healthcare expenditures.

11:30
Explainable Multimodal Deep Learning for Improved Diabetic Retinopathy Referral Decisions

ABSTRACT. This paper presents a multimodal deep learning (DL) model for diabetic retinopathy (DR) referral that integrates retinal fundus images with clinically relevant data selected through an explainable process. Using Shapley Additive exPlanations (SHAP) across five machine learning (ML) models, we identified urinary albumin excretion, diabetes duration, insulin use, HbA1c, and systolic blood pressure as the most informative clinical features. We integrated these variables into an InceptionV3-based convolutional neural network (CNN) through late fusion and evaluated the model on two independent datasets from Hospital de Cl´ınicas de Porto Alegre (HCPA-2019: 2,522 images; HCPA-2021: 1,555 images). Compared with an image-only baseline, the multimodal model increased specificity from 56.7% to 64.7% in HCPA-2019 and from 72.4% to 77.5% in HCPA-2021, while maintaining sensitivity above 95% and an AUC above 0.93. These findings indicate that incorporating clinically interpretable metadata can reduce false-positive referrals and improve the clinical relevance of Artificial Intelligence (AI)- based DR screening.

11:45
A Bi-modal Knowledge Distillation Framework for Explainable Neonatal Jaundice Diagnosis

ABSTRACT. Neonatal jaundice is a prevalent condition whose delayed diagnosis can lead to severe neurological complications. Conventional diagnostic approaches such as total serum bilirubin (TSB) testing are invasive and resource intensive, while the existing non-invasive approaches often suffer from limited generalization and interpretability. To address these challenges, this study proposes a lightweight bi-modal framework (BiNJD) trained via knowledge distillation for accurate, explainable, and efficient neonatal jaundice diagnosis. The approach integrates spatial visual representation with language-based clinical semantics. A Vision Mamba visual teacher and an LLM + CLIP-based textual teacher jointly transfer complementary knowledge to a lightweight ResNet-50 student through multi-level cross-modal knowledge distillation, enabling rich feature learning without increased inference complexity. Model explainability is achieved using Grad-CAM visualizations and vision-language alignment to reinforce clinically meaningful reasoning. The proposed approach achieves state-of-the-art performance on the NJN dataset (97.37% accuracy, 98.25% F1-score) and demonstrates strong cross-dataset generalization on the externally curated JaundiSet-NG dataset, which contains darker skin tones from African populations. Computational evaluation shows significant reductions in parameters, FLOPS, and inference latency, highlighting the framework’s suitability for real-world deployment in resource-constrained clinical environments.

12:00
Explainable Finger Kinematics Decoding with Temporal Graph Neural Networks

ABSTRACT. Wearable kinematic sensing enables fine-grained monitoring of hand function, which is important for rehabilitation, assistive technology, and clinical assessment of motor disorders. The ultimate goal is to use it as a digital twin in order to identify specific motor disorders, and monitor their evolution. However, decoding exoskeletal data glove recordings is challenging because the signals are high dimensional, anatomically constrained, and informative mainly through coordinated finger joint dynamics over time. Moreover, the resulting classifiers should be interpretable at the finger joint level. We introduce (i) a \emph{Physio-Digital Temporal Graph (PDTG)} that models hand anatomy and supports correlation derived functional connectivity overlays for interpretation, and (ii) an explainable Temporal Graph Neural Network (TGNN) that performs message passing using Graph Convolutional Network (GCN) at each time step and aggregates temporal dynamics with a bidirectional Long Short Term Memory (bi-LSTM) for window level classification of six hand movement tasks that were performed by 17 younger adults and 17 older adults using an exoskeleton data glove. Raw joint-angle trials are transformed into angle, velocity, and acceleration features, segmented into sliding windows which we evaluate using 5-fold subject-wise cross-validation to prevent subject-leakage. Our TGNN model reached 92.5% ± 2% macro-F1 score for window-level task classification, and for interpretability, we used integrated gradients to produce finger joint level attribution maps and compared attribution patterns between younger and older adults on fine manipulation tasks, highlighting task dependent differences in finger joint involvement. In general, the proposed framework provides an interpretable way to decode hand movements and enables group level comparisons among age groups based on hand topology.

12:15
Adaptive Group-Based Counterfactual Explanations for Time-Series Rehabilitation Data

ABSTRACT. Counterfactual explanations for multivariate time-series classifiers are often difficult to interpret in domains where experts reason in terms of semantic feature groups rather than individual channels. In rehabilitation movement analysis with multi-sensor inertial measurement units (IMUs), clinicians interpret motion through muscle-group and joint-segment abstractions; yet, most existing counterfactual methods operate at the channel level, producing scattered and biomechanically incoherent explanations. We propose a two-stage framework for group-based counterfactual generation in high-dimensional IMU data. We first show that Shapley-Adaptive (SA) group ranking preserves counterfactual validity but fails to enforce group-level sparsity, motivating the need for explicit group selection. We then introduce Learnable Gate (LG) methods, which incorporate trainable per-group relevance gates jointly optimized with perturbation masks. Experiments on the KneE-PAD rehabilitation dataset demonstrate that LG substantially improves modality-group sparsity compared to the channel-level M-CELS baseline while maintaining or improving validity, temporal smoothness, and generation efficiency. Exercise-specific analyses further show that group-structured counterfactuals yield concise, muscle-level corrective guidance aligned with clinical reasoning. Overall, the proposed framework enhances interpretability without sacrificing counterfactual quality, enabling more actionable explanations for rehabilitation movement analysis.

11:00-12:30 Session 3B: Breast Imaging, Mammography, and Oncological AI
Location: Atrium A
11:00
Self-Supervised Foundation Models for Mammography: A Survey of Architectures, Benchmarks, and Clinical Translation Challenges

ABSTRACT. Two large prospective trials—MASAI (n=105,934, Sweden) and PRAIM (n=461,818, Germany)—have demonstrated that AI-assisted screening significantly improves cancer detection, reporting increases of 29% and 17.6% respectively. Despite these gains, the reduction in interval cancers remains modest (1.55 vs. 1.76 per 1,000 screens in MASAI), and aggressive subtypes such as triple-negative breast cancer remain largely undetected. While factors such as cancer growth kinetics, imaging physics, and screening intervals contribute to this residual burden, this survey argues that one plausible limiting factor is architectural: supervised convolutional systems are trained on discrete radiological labels that exclude the pre-malignant tissue signals responsible for interval cancers. Self-supervised learning (SSL) avoids this by deriving training signal from unlabelled image structure. Four classes of self-supervised learning techniques applicable to mammography are compared—contrastive learning (SimCLR/MoCo), masked autoencoders (MAE), self-distillation (DINO), and vision-language pre-training (CLIP)—along three axes: training mechanism, data requirements, and resistance to scanner-induced domain shift. We review domain-specific foundation models including MammoDINO, Mammo-CLIP, MAMA, Mammo-FM, and VersaMammo, analyse how each addresses the generalisation gap, discuss interpretability requirements for clinical deployment, and outline the research directions most likely to reduce the residual interval cancer burden.

11:15
Mammography BI RADS Reformulation: From Raw Categories to Low/High Risk and Soft-Label Modelling

ABSTRACT. Calcification findings in mammography are small, subtle, and often lack sufficient contextual cues for confident assignment of fine grained BI RADS categories. As a result, models trained to predict the four discrete BI RADS classes (2, 3, 4, 5) from calcification only evidence tend to exhibit limited stability and modest accuracy. This work shifts attention from model architecture to label formulation by reframing the task as a clinically aligned BI RADS risk prediction problem. Instead of predicting raw BI RADS classes, we group them into Low (2–3) versus High (4–5) risk and compare Hard labels to Soft labels near the 3/4 boundary. Using the same classifier and unified cross database calcification dataset, the raw four class setup reaches BACC 69.5%. The Low/High reformulation with Hard labels improves to BACC 81.6% / AUC 88.1%. The soft label variant reaches BACC 80.4% / AUC 87.2%, with a small drop in sensitivity but higher specificity and more cautious probabilities near the edge. These results show that a simple, clinically aligned risk reformulation makes analysis of calcification patterns more stable and practical without modifying the model architecture.

11:30
Breast Cancer Subtyping using Digital Mammograms and Feature-based Machine Learning: Temporal Subtraction vs. Single-Mammogram Analysis

ABSTRACT. Breast Cancer (BC) remains a leading cause of morbidity and mortality among women worldwide. The clinical management and prognosis depend heavily on the molecular subtype of BC, which is determined by biopsy and histopathological analysis. Despite their diagnostic value, these procedures are invasive, costly, and can delay clinical decision-making. This study investigates the added value of Temporal Subtraction (TS) compared to Single-Mammogram (SM) analysis for automatic detection and subtyping of BC using digital mammograms. A newly collected data set of 164 sequential temporally digital mammograms, annotated by two expert radiologists, was used. Two parallel machine learning pipelines were developed. The TS pipeline incorporated image pre-processing, registration, and temporal subtraction prior to lesion segmentation, feature extraction, selection, and classification. The SM pipeline followed the same workflow, excluding image registration and temporal subtraction. Classification was evaluated for luminal A vs. non-luminal A subtypes. The TS-based approach achieved an accuracy of 91.2%, outperforming the SM-based analysis, which reached 87.6%, with the improvement being statistically significant (p < 0.05). The results demonstrate that exploiting TS provides complementary diagnostic information, enhancing subtyping performance. The proposed framework highlights the potential of TS as a fast, non-invasive decision-support tool that can reduce reliance on biopsies.

11:45
Classification of Breast Cancer Patterns in Immunohistochemistry (IHC) Images based on Multi-Class Targets

ABSTRACT. Microscopic analysis of biopsy slides, particularly immunohistochemistry (IHC), plays a critical role in cancer diagnosis. This process still largely relies on the visual assessment and cell counting, which are time-consuming and subject to error and variability. Recent advances in Deep Learning, especially convolutional neural networks (CNNs), have enabled the devel- opment of interesting architectures for cell detection, counting, and classification, providing robustness and standardization for histopathological analysis. In this paper we present a comparative study of CNN architectures applied to the analysis of microscopic biopsy images in breast cancer pathology. The proposed approach considers object detection (OB) and object classification (OC) paradigms. For OB, a Faster R-CNN framework with a ResNet-50 backbone and a Feature Pyramid Network (FPN) was considered. For OC, architectures such as DenseNet-121 were evaluated due to their dense connectivity, which promotes efficient fea- ture reuse and improved representation of fine-grained textural patterns. The experimental results were applied in a prepared environment, where dataset partitioning, data augmentation, normalization, and evaluation were considered, as well as the objective evaluation metrics mean Average Precision (mAP), precision, recall, F1-score, and cell counting error. The obtained experimental results indicate the RESNET as the most suitable approach for multi-class classification with 82.89% precision, suggesting it as a promising solution for diagnosis in microscopic biopsy analysis.

12:00
Segmentation-Driven Background Skin Extraction for Robust Skin Tone Estimation in Dermatological Images

ABSTRACT. Performance disparities across skin tones remain a critical challenge in dermatological artificial intelligence, largely driven by demographic imbalance in publicly available datasets. Reliable skin tone estimation is therefore essential for analyzing and mitigating potential bias. However, accurate computation of the Individual Typology Angle (ITA), a widely used metric for skin tone characterization, depends on correctly isolating healthy background skin in lesion images. This work proposes a segmentation-driven framework for robust skin tone estimation by systematically evaluating four background skin extraction strategies: Center Crop, Structured Patches, YOLO-based lesion exclusion, and SAM-based pixel-level segmentation. Experiments conducted on the HAM10000 and PAD-UFES-20 datasets analyze how these strategies influence ITA distributions, Fitzpatrick skin type categorization, and dataset skin tone composition. Results show that background extraction significantly affects tone estimation stability and subgroup representation. While automatic methods provide consistent estimates for lighter tones, darker tones remain challenging due to dataset imbalance and intrinsic overlap in ITA values under heterogeneous acquisition conditions. To support reproducible research, we also release derived skin tone annotations and curated background skin patches for the evaluated datasets, enabling further studies on fairness and bias in dermatological AI systems.

12:15
MammoTwin: An Open Source Digital Twin Framework for Protocol Optimization via MRI-to-Mammography Synthesis

ABSTRACT. Mammography screening inherently involves cumulative exposure to ionizing radiation. Digital twin simulation offers a promising pathway for protocol optimization without biological risk, yet it requires rigorous physical validation to ensure clinical relevance. This work presents MammoTwin, an open-source software framework designed to generate synthetic mammograms from Magnetic Resonance Imaging (MRI), validated through standard clinical Quality Assurance (QA) metrics. The proposed system integrates a comprehensive five-phase pipeline: data acquisition, hybrid AI-driven segmentation utilizing nnU-Net and BreastSegNet, biomechanical compression modeling with exact volume preservation, physics-based X-ray simulation based on NIST XCOM cross-sections at 28 keV, and automated QA validation. Quantitative evaluation on a diverse cohort of thirteen test cases demonstrated the system's robustness across varying anatomies. The generated synthetic projections exhibited a maximum dynamic range of 64.0 dB and an effective depth ranging between 5.8 and 10.6 bits, faithfully capturing biological variability. The cohort achieved an average Signal-to-Noise Ratio (SNR) of 88.6, confirming diagnostic quality across both adipose-dominant and high-density breast tissues. Ultimately, MammoTwin is a viable tool to estimate the patient specific radiation dose, allowing for the optimization of the mammogram protocol that provides high quality images at low radiation levels.

11:00-12:30 Session 3C: Cognitive Signal Analysis
Location: Atrium B
11:00
Assessing the effects of xDAWN filtering, reduction of EEG channels and addition of EOG signals on the classification of movement versus rest

ABSTRACT. Movement-Related Cortical Potentials appear before movement execution/intention and have been proposed for the control of brain-computer interfaces. In this work, the detection accuracy of hand movement execution from time intervals where MRCPs are expected is investigated, using a publicly available dataset, three montages, and amplitude features with or without xDAWN filtering. The montages were: 1) The 31-channel montage used by the dataset authors (HD), 2) a montage of 6 channels covering the motor area (LD), and a Hybrid setup comprising the LD and 4 electro-oculography channels. The classifier used was Shrinkage Linear Discriminant Analysis. The accuracy scores for the 2 x 3 factors were tested using rmANOVA, with sphericity corrections where necessary and Tukey's HSD post-hoc tests. Both the montage and its interaction with the xDAWN filtering had a significant effect on the achieved accuracies (p < 0.05), with the HD and Hybrid montage outperforming the 6-channel subset. We thus conclude that hand movement execution detection benefits from denser montages and electro-oculography information.

11:15
Motor–Cognitive Performance After Cold Exposure Therapy: Wearable EEG Mini-Golf Study

ABSTRACT. Cold exposure triggers acute sympathetic activation and norepinephrine release, effects that may transiently enhance attention and motor precision after rewarming. We report a multi-session field study combining cold exposure (CE) with motor and cognitive assessments. In Session 1 (February 25), nine participants completed standardized mini-golf putting before and after a cold plunge; a subset wore Muse 2 EEG headbands (4-channel, 256 Hz) during putting before cold exposure(n = 3, 101 epochs) and after (n = 4, 110 epochs). Mini-golf scores improved in five of seven scored participants (+18.4% group mean). EEG theta/beta ratio (TBR) decreased by 12.0%, and frontal alpha asymmetry (FAA) shifted from +0.18 to +2.46, consistent with heightened alertness and approach motivation. In follow-up sessions, participants completed cognitive batteries before and after cold exposure with concurrent EEG. Across all sessions, three participants with pre/post cognitive data maintained or improved PASAT accuracy; one improved from 7/10 to 10/10 and reduced serial subtraction time by 30%. Blood pressure decreased after CE in all measured participants. These findings suggest that a single cold plunge may acutely enhance focused attention and motor–cognitive performance.

11:30
Robust and Reproducible Evaluation of Narrative-Based Sleep Disorder Classification Using Hybrid Semantic and Lexical Representations

ABSTRACT. Natural language processing enables automated classification of patient-reported sleep narratives, yet many clinical studies rely on limited validation and optimistic estimates on small datasets. This study presents a stability-aware framework for narrative-based sleep disorder classification using hybrid lexical and contextual semantic representations with a regularized linear model. A corpus of 474 labeled narratives spanning five clinically motivated categories was evaluated using repeated nested stratified cross-validation and multi-seed independent holdout testing. The model achieved a mean holdout macro-averaged F1-score of 0.90 ± 0.04, balanced accuracy of 0.91 ± 0.03, and Matthews correlation coefficient of 0.87 ± 0.05. Robustness diagnostics including anchor-term masking, stylistic baselines and duplicate analysis, demonstrated controlled partition sensitivity and limited reliance on lexical shortcuts. The findings highlight methodological rigor, reproducibility, and explainable modeling as essential for clinically reliable digital sleep scoring systems.

11:45
Sampling Matters: The Effect of ECG Frequency on Deep Learning-Based Atrial Fibrillation Detection

ABSTRACT. Deep learning models for atrial fibrillation (AF) detection are increasingly trained on heterogeneous electrocardiogram (ECG) datasets with varying sampling frequencies, yet the specific consequences of these discrepancies on model performance, calibration, and robustness remain insufficiently characterized. To address this, we conducted a systematic benchmark using 12-lead, 10-second recordings from the PTB-XL dataset, resampled to target frequencies of 62, 100, 250, and 500~Hz, to evaluate a standard 1-D Convolutional Neural Network (CNN) and a hybrid CNN-Long Short-Term Memory (LSTM) architecture under a rigorous patient-safe cross-validation framework. Our analysis reveals that sampling frequency significantly impacts detection metrics in an architecture-dependent manner; the hybrid CNN-LSTM model demonstrated optimal performance and consistent calibration at intermediate frequencies (100–250 Hz), whereas the 1-D CNN baseline exhibited marked degradation in accuracy and sensitivity at 500 Hz, suggesting increased susceptibility to high-frequency noise. We conclude that ECG sampling frequency is a critical, underappreciated factor in arrhythmia detection, and future foundation models must explicitly control for temporal resolution to ensure clinical reliability and reproducibility.

12:00
Characterization of a Computerized Method for Non-invasive Measurement of Arterial Hyperelastic Properties: Potential for Decoding Vascular Ageing

ABSTRACT. Arterial stiffness is a core marker of vascular ageing, and pulse wave velocity (PWV) is widely used for its non- invasive assessment. However, conventional beat-level PWV is pressure-confounded and does not directly characterise within- beat hyperelastic behavior. In addition, no established automated ultrasound framework currently exists for direct incremental PWV estimation in routine point-of-care workflows. We develop a computerised multichannel RF-ultrasound method that estimates two systolic fiducial wave speeds and defines incremental PWV as ∆PWV = PWV2 − PWV1. We also construct a controlled in silico test bed to characterise sensitivity to frame rate, RF sampling rate, SNR, and channel count. Results show that frame-rate reduction via dropped-frame reconstruction causes substantial timing error at low frame rates, followed by a saturation region beyond which additional frame- rate increase provides marginal benefit. RF sampling sweeps show progressively increasing error as sampling is reduced, consistent with degraded fiducial morphology and timing local- ization. SNR analysis identifies a usable operating band and demonstrates that multichannel regression is markedly more robust than two-channel estimation, especially in low-SNR con- ditions; under selected operating conditions, incremental-PWV error remains below expected physiological within-beat PWV change. These findings indicate that reliable incremental PWV estima- tion is feasible when temporal resolution, sampling fidelity, and synchronization constraints are jointly satisfied and multichannel fitting is used. Clinically, this supports automated ultrasound- based incremental PWV as a scalable tool for early vascular- ageing assessment and longitudinal risk monitoring

12:15
Development of an automated pipeline for the digitisation of paper pain drawings

ABSTRACT. A pain manikin is a digital or paper-based diagram of a human body which can be marked with the locations where an individual experiences pain. Automated digitisation of paper manikins has the potential to save time over manual annotation, allow deeper analysis of the data, and facilitate data sharing. Current methods for digitising paper pain drawings rely on drawings being created with a red marker pen, so are unsuitable for existing datasets. Our objective was to develop and perform an initial validation of an automated open-source digitisation pipeline for paper pain manikin drawings, suitable for use on existing datasets not collected using a specific pen or drawing method.

We created an automated pipeline which aligned scanned images to a blank manikin template and isolated drawn marks from the manikin outline, generated pixel maps of the pain areas and identified which pre-defined pain regions were marked as painful. We also created a synthetic dataset to assist with reproducibility. We performed a descriptive analysis comparing the outputs of the pipeline with manual annotations from a human rater. Comparing the pipeline to a human rater on identification of pre-defined pain regions (n=44 regions per drawing) on a synthetic dataset (n=20 drawings) found that of ten regions where the pipeline disagreed with the human rater, the pipeline was correct six times. Manual inspection showed that the pixel maps were generally accurate but included non-painful areas when the drawn pain area had a convex shape.

We developed an automated pipeline for the digitisation of paper pain drawings. The pipeline may reduce human error when identifying marks in predefined regions. Further work is needed to improve the shape of pixel maps for certain shapes of pain area and to validate the pipeline.

11:00-12:30 Session 3D: Smart Medical Device Platforms
Location: Atrium C
11:00
Fatigue and Burnout Management in Nursing Staff Using Wearable Data

ABSTRACT. Healthcare professionals, particularly nurses, are exposed to demanding shift-based working conditions that contribute to cumulative fatigue, burnout, and increased risk of clinical errors. This paper presents a data-driven platform for fatigue-informed shift scheduling that integrates continuous highresolution wearable-derived physiological signals, self-reported assessments, and contextual workload information. A hybrid modelling framework combines rule-based occupational health logic with machine learning–based physiological stress estimation to compute a unified Stress Index reflecting both acute strain and cumulative fatigue. The resulting index is incorporated into a scheduling engine to support workload allocation that respects predefined stress thresholds. Early-stage deployment and testing in a clinical environment indicate stable system operation, reliable data acquisition, and promising initial results regarding the feasibility of continuous stress monitoring within routine scheduling workflows.

11:15
A Modular, SOLID based Hybrid Software Architecture for Medical Devices on Heterogeneous Edge Platforms

ABSTRACT. Medical device software development faces persistent challenges in portability, maintainability, and scalability particularly in vision based systems where tightly coupled, hardware specific implementations dominate. Existing architectures bind imaging pipelines directly to underlying hardware, resulting in high porting costs and resistance to modular testing. This results in significant rework during platform transitions, conflicting with the modular, independently verifiable software design advocated by standards such as IEC 62304. This work presents a modular, hybrid software architecture based on SOLID principles for medical devices, combining a layered structural decomposition with a messaging layer agnostic, event-driven inter-layer communication model. Five independently operating, process isolated layers Hardware subsystems, Image Signal Processing, Database, GUI, and Business Logic are initialised through a Configuration Layer and communicate exclusively through a lightweight asynchronous message bus, enforcing low coupling and enabling runtime reconfigurability without service interruption. The architecture serves as a general architectural framework applicable across medical device software systems, validated through a vision based imaging pipeline on heterogeneous edge platforms. Runtime reconfiguration of the ISP pipeline topology is demonstrated without disruption to adjacent layers, and the GUI layer is deployed on an architecturally distinct platform with zero source code modification, validating process-level isolation and portability. The pipeline delivers 60 FPS with zero frame loss in standard operating mode across resource differentiated hardware configurations. Cross-system reuse was validated with four of five layers requiring zero modification for a clinically distinct second system.

11:30
Mixed Reality in Electronic Health Records: User Requirements and Evaluation

ABSTRACT. Electronic Health Records (EHRs) are essential to contemporary healthcare but remain limited by fragmented interfaces, poor usability, and inadequate support for complex multimodal data. Mixed Reality (MR) offers new opportunities to address these challenges, yet its integration into clinical information systems is largely unexplored. This paper presents the design and evaluation of a prototype MR-enhanced EHR that enables physicians to visualize and interact with patient data through holographic interfaces. Using MR glasses, clinicians access a three-dimensional representation of the patient’s body augmented with demographics, medical history, and diagnostic data, including 3D-rendered CT and MRI scans. Clinical information is spatially organized and accessed via gaze and gesture, supporting intuitive exploration and collaboration. A two-phase user study assessed requirements, usability, and clinical relevance. Results show strong acceptance among medical students and selective but promising interest among physicians, particularly for imaging, surgical planning, and patient communication. Overall, the findings indicate that MR-based EHRs can reduce cognitive load, improve anatomical understanding, and enhance collaborative and patient-centered care.

11:45
A Consumer EEG–Driven Word Keyboard for Assistive Communication

ABSTRACT. devices to communicate. These systems typically depend on eye-gaze tracking or touch input, which can become unreliable as a user's condition progresses or environmental conditions vary. We present a low-cost, brain-controlled word selection keyboard, developed entirely with consumer-grade components: a Muse 2 electroencephalography (EEG) headband, a smartphone running the Mind Monitor application, and a laptop. The system streams raw four-channel EEG data over Open Sound Control (OSC) streaming and extracts 65 spectral features for every two-second window collected using an FFT-based signal processing pipeline. The input is classified into one of four classes (blink, look-left, look-right, and background) using a compact neural network. A Pygame-based visual keyboard inspired by TD-Snap, a commercial AAC interface, allows users to navigate a word grid through the lateral eye movements. Words are selected with deliberate blinks, which triggers both text-to-speech output and an HID keyboard emulation. We describe the end-to-end system architecture and methods developed for live deployment. We also discuss the practical challenges encountered during real-time operation, mainly the sampling-rate discrepancies between offline and online data, the required action synchronization with the 2 second windowing periods, and the inherent signal limitations of four-channel frontal/temporal EEG. The complete system is open-source and demonstrates the feasibility of using consumer-based BCI hardware as a method of communication for users experiencing motor impairments.

11:55
Emotion-Aware Multimodal Virtual Rehabilitation: Integrating EEG and Motor Performance for Adaptive Therapy Regulation

ABSTRACT. Rehabilitation plays a fundamental role in promoting functional independence for individuals with motor impairments. Virtual rehabilitation systems have demonstrated benefits for engagement and motor performance; however, most existing systems adjust task difficulty solely on the basis of motor performance metrics, while neglecting emotional states that influence motivation and adherence. This study proposes an affective computing-based virtual rehabilitation approach that integrates electroencephalography-derived emotional metrics into real-time difficulty adjustment. The system combines hand tracking with a Leap Motion Controller and affective monitoring via the EMOTIV Insight brain-computer interface. Stress and interest indicators are continuously analyzed to dynamically adjust task difficulty by modulating an in-game agent's speed. A preliminary user experience evaluation using the User Experience Questionnaire was conducted with 8 participants, divided into control and experimental groups. The experimental group exhibited higher scores in perspicuity (1.75 vs 0.56) and stimulation (1.37 vs 0.87), suggesting that integrating affectively driven adaptation enhanced perceived clarity and motivational engagement. These preliminary results suggest that the emotion-aware version improved perceived perspicuity and stimulation compared to the task-oriented version, while maintaining stable efficiency and dependability. These findings suggest that incorporating affective metrics as primary adaptation variables enhances engagement and supports a more balanced challenge-skill balance in virtual rehabilitation environments.

12:30-14:00Lunch Break
15:30-16:00Coffee Break
16:00-17:30 Session 6A: Explainable and Generative AI for Clinical Decision Support
Location: Panorama
16:00
Explainable AI in Medicine: Trends and Educational Implications

ABSTRACT. The healthcare sector is facing rapid transformation through the extensive use of digital technologies and specifically Artificial Intelligence (AI). AI has the potential to drastically affect and improve the way medicine is provided to patients. Currently, several medical AI-based applications are under experimentation and some of them are already being used in the clinical practice. For the advancements that AI is bringing to be maximized, AI-based tools and applications need to be trusted by physicians and patients. In this context, techniques aiming to explain decisions made by AI tools have been developed, leading to the so-called Explainable Artificial Intelligence (XAI). In this paper, we conduct a bibliometric analysis of papers related to AI, XAI and medicine in order to describe the research landscape. Based on the bibliometric analysis results, we outline the need to educate physicians and healthcare professionals more broadly about these emerging technologies and propose insights into how this can be achieved.

16:10
Faithfulness and Uncertainty Calibration of Large Language Models in Portuguese Medical Question Answering

ABSTRACT. The deployment of Large Language Models (LLMs) in healthcare is constrained by limited transparency and susceptibility to factual errors. In clinical settings, predictive accuracy must be complemented by reliable uncertainty estimates and explanations that reflect the model’s internal decision process. This paper presents an evaluation framework for Portuguese medical LLMs using the DrBodeBench dataset. We evaluate multiple scales of the Qwen model family to examine uncertainty discrimination and attributional behavior. We compare Naive Entropy, Semantic Entropy, and self-verification via $P(\text{True})$ for hallucination detection. We further employ SHAP within a perturbation-based protocol to assess explanation faithfulness. Results indicate that instruction fine-tuning improves uncertainty discrimination as measured by ROC-AUC. Under our implementation, Semantic Entropy achieves the most consistent trade-off between discrimination and calibration across model scales. Perturbation analysis reveals systematic performance degradation under the removal of highly ranked tokens, suggesting partial alignment between attribution scores and decision-relevant features in Portuguese medical question answering.

16:20
XAIqi and XAIci: Quantifying Explainability Quality and Task Complexity Across Predictive Models in Stroke Outcome Prediction

ABSTRACT. Machine learning models are increasingly used in healthcare, yet similar predictive performance across models may conceal divergent explanations, introducing explanatory uncertainty in clinical decision-making. To address this challenge, we propose two novel metrics in the context of stroke care for predicting the National Institutes of Health Stroke Scale (NIHSS) at hospital discharge. The XAI Quality Index (XAIqi) quantifies the consistency and robustness of feature importance across heterogeneous models, identifying variables that remain relevant regardless of model architecture. The XAI Complexity Index (XAIci) characterizes task complexity based on the variability of explanatory patterns between models, reflecting how consistently a prediction task can be interpreted across algorithms. Using different machine learning algorithms we demonstrate how integrating explainability across models reduces model-specific artifacts and strengthens confidence in clinically meaningful predictors. Together, XAIqi and XAIci provide a unified framework for assessing explainability quality and task complexity in AI-driven stroke outcome prediction.

16:30
Counterfactual Reasoning to Executable Clinical Guidelines: A DMN-Based Framework for Diabetes Risk Assessment ا

ABSTRACT. This paper presents a hybrid decision-support framework that strengthens clinical guideline formalization by combining Decision Model and Notation (DMN) with machine-learning–based evidence and counterfactual reasoning, with the goal of improving transparency and uncertainty-aware adoption in healthcare decisions. The approach is demonstrated in the context of diabetes risk assessment. The guideline logic is formalized as DMN decision tables and operationalized as executable, auditable rule conditions. These rule conditions are then linked to data-driven evidence derived from predictive modeling to quantify outcome risk and support actionable “what-if” assessments.

To provide actionable recourse aligned with guideline semantics, counterfactual sensitivity analyses are performed under feasible interventions on modifiable patient factors. Experiments use an NHANES-derived cohort restricted to the fasting subsample, and counterfactual scenarios are generated by decreasing body-mass index (BMI) and re-evaluating DMN rule outcomes to estimate corresponding changes in diabetes risk. Predicted risk decreases consistently as BMI is reduced, with the largest improvements concentrated among individuals near DMN decision thresholds, where small changes can alter rule firing and downstream risk. Overall, the framework complements DMN-based guideline formalization with empirically grounded evidence and counterfactual insights that remain interpretable and clinically actionable.

16:40
Prompt-Based Adaptation of Vision Language Models for Clinical Pain Note Generation from Neonate Cry Sound

ABSTRACT. Accurate neonatal pain assessment remains challenging in clinical care, where documentation must be both timely and interpretable. We present a prompt-based method that adapts BLIP-2 to generate clinical pain notes from neonatal cry sounds with expert guided pain features. Cry recordings are converted to log-mel spectrograms, providing a visual representation of pain-related acoustic structure. BLIP-2 processes these spectrograms using a pretrained visual encoder and query-based cross-modal fusion, enabling image-conditioned language generation without task-specific retraining. With few-shot prompting, exemplar spectrograms paired with clinically meaningful 'pain' and 'no pain' descriptions guide the model to produce structured, human-readable notes that include an assessment outcome and salient cues such as high-frequency emphasis, intensity concentration, and temporal irregularity. Experiments show the framework produces consistent pain assessments under limited supervision, supporting AI-assisted neonatal pain documentation and decision support. The novelty impact of this paper is that it extends vision-language prompting into a clinical documentation setting by using neonate cry derived spectrograms not only for pain classification, but also for generating a structured clinical pain note.

16:50
Collaborative Intelligence in Mental Health: A Multi-Agent Framework for Personalized Treatment and Health Promotion using Next-Gen LLMs

ABSTRACT. The progressive increase in mental health disorders diagnosis demands proactive and holistic health promotion, as well as personalized symptom treatment. Personalized and holistic health care plans must be appropriate for the individual and integrate the biopsychosocial model. Previous work demonstrates promising capabilities with large language models. However, single-agent architectures generally lack the depth of reasoning required to generate comprehensive plans that respect ethics, privacy, and safety in healthcare. This paper proposes a large language model-based multi-agent system designed to generate personalized, holistic, and evidence-based health and care plans that encompass the mental health domain. An agent-based workflow was developed using the AutoGen framework. The architecture consists of four specialized agents. A dataset of 40 simulated clinical cases was used. The results demonstrate the proposed system's ability to generate comprehensive, holistic clinical and lifestyle plans arising from interactions among multidisciplinary agents. Demonstrating that this type of multi-agent architecture could become a useful tool to support healthcare professionals.

17:00
RHEUMA-PRIOR-SCRIBE: Multi-Agent RAG for Rheumatology Consultation Prioritization

ABSTRACT. Rheumatology anamnesis often yields long, redundant narratives that increase clinicians’ cognitive load and hinder timely prioritization of care. To address this issue, we present a guideline-grounded decision-support prototype that combines Retrieval-Augmented Generation (RAG) with an explicitly orchestrated Multi-Agent System (MAS) to transform Spanish free-text patient narratives into structured, clinician-ready outputs. The system indexes EULAR / ACR guideline content in a vector database and retrieves case-relevant passages at runtime to ground downstream reasoning. Specialized agents then extract core anamnesis parameters, assess completeness, identify alarm features, generate a concise clinical summary and an EHR-ready note, and assign a care-priority score (0–10) with justification, without producing diagnoses or treatment recommendations. We conducted a preliminary evaluation on 21 rheumatology consultation scenarios designed by a specialist, each accompanied by the expert reference priority value and the expected outputs. The proposed pipeline achieved a MAE = 0.62 and a RMSE = 1.1 for priority assignment relative to expert scores, while retrieval relevance averaged 3.0/5. These results support the feasibility of combining guideline-based retrieval with controlled multi-agent reasoning to produce auditable, structured anamnesis outputs and assist in consultation prioritization in rheumatology.

17:10
Linguistic Speech Disfluencies: A Gender-Neutral Biomarker for Speech-Based Anxiety Detection

ABSTRACT. This study investigates whether linguistic speech disfluencies, such as cognitive markers of anxiety, serve as gender-neutral biomarkers that enable equitable anxiety screening while transcending the sexual dimorphism of acoustic features. We analyzed the DAIC-WOZ clinical interview corpus to train a Random Forest classifier for detecting high-anxiety states based on Patient Health Questionnaire-8 scores (binary threshold:PHQ-8geq10). We conducted a systematic gender fairness audit for anxiety detection that jointly examines gender disparities, linguisticfeature robustness, and their interaction under controlled bias evaluation protocols. Our findings demonstrate that thoughtful feature engineering, grounded in domain knowledge about which features should theoretically be demographic-invariant, can be as effective as complex algorithmic interventions while providing greater transparency and interpretability. We urge authors of high-accuracy models to retroactively audit fairness on published results and recommend that journals require fairness evaluation sections in all submissions. We provide empirical validation of this hypothesis and demonstrate a practical pathway toward deploying fair AI systems in clinical mental healthcare.

17:20
From Behavioral Tracking to Aging Biomarkers : An Explainable Machine Learning Framework on the African Turquoise Killifish

ABSTRACT. Identifying reliable predictive biomarkers of aging is critical for understanding functional decline across biological systems. The African Turquoise Killifish (ATK), a naturally short-lived vertebrate, offers a unique opportunity to study aging dynamics over a compressed lifespan. We present a spatio-temporal machine learning (ML) framework to extract and analyze behavioral signatures of aging from longitudinal locomotor recordings of the ATK. We analyze swimming trajectories from a cohort of 92 fish aged 5–32 weeks, integrating both lateral and dorsal recording perspectives. By segmenting time series at multiple temporal scales, we quantify how short- and long-term behavioral patterns relate to age and sex. Predictive models reveal that morphological stability appears to structure the global age separation, while locomotor dynamics capture progressive transitions across aging stages. Crucially, Shapley additive explanations highlights a set of behavioral patterns that are robust across temporal resolutions, providing candidate biomarkers for aging in short-lived vertebrates. Our analysis demonstrates that spatio-temporal behavioral dynamics capture meaningful biological variation, offering insight into the progression of functional decline.

16:00-17:30 Session 6B: Medical Image Segmentation and Reconstruction
Location: Atrium A
16:00
Attention U-Net with Algebraic Refinement for Sparse-View CT Reconstruction

ABSTRACT. Sparse-view Computed Tomography is a well-known dose reduction strategy. Algebraic methods can achieve reconstructions with highly undersampled data where analytical methods such as FBP often fail, though at a higher computational cost. However, a major challenge in this approach is the generation of streak artifacts that significantly worsen the quality of reconstructed images when a very low number of projections is used. In addition, the ill-conditioned nature of the problem also causes slow convergence and requires thousands of iterations. This is why complementary strategies such as regularization or filtering are needed to improve the stability of the problem. This paper proposes a hybrid reconstruction framework that uses an Attention U-Net to obtain an initial solution for an iterative algebraic reconstruction process. Specifically, the network is fed with low-quality reconstructions using few iterations of an algebraic method and generates improved images that are used as the initial solutions of the iterative method, improving both quality and numerical convergence. The data used for this study has been selected from the chest studies of the DICOM-CT-PD dataset. Results show a substantial improvement in reconstruction quality with extreme undersampling (33 projections for 256x256 pixel resolution, where the minimum required by the Nyquist theorem is 400). Specifically, by using the hybrid framework, SSIM increases from ≈ 0.85 (obtained with 200 iterations of the iterative method alone) to ≈ 0.96. Visually, the network effectively suppresses the streak artifacts that previously dominated the image and preserves structural details effectively, and the algebraic refinement improves numerical consistency. Although soft tissue areas with lower contrast still show structural inaccuracies, this framework provides a solid basis for future advances.

16:15
Level-set-guided CNN-based Segmentation of CTPA Scans for Pulmonary Emboli Extraction

ABSTRACT. Pulmonary embolism (PE) is a severe cardiovascular condition that requires prompt and accurate diagnosis to prevent life-threatening complications. Computed tomography pulmonary angiography (CTPA) is the primary imaging modality for PE diagnosis. However, the manual analysis of CTPA scans remains challenging due to high anatomical variability, low contrast, and limited annotated data. Recent advances in deep learning (DL) have shown promise in automating CTPA image segmentation for PE boundary extraction. However, challenges remain, associated with the small size of PE and the associated class imbalance within CTPA scans, as well as with the generalization capability that is affected by limited available annotations. In this work, we introduce a DL-based segmentation method that integrates a novel loss function variant, the adaptive level-set (ALS) loss, which encompasses spatial and boundary information. The ALS loss aids the model to cope with the small size of PE and the associated class imbalance within CTPA images, whereas it acts as an inherent noise filtering mechanism and strengthens generalization capability. The experimental evaluation, conducted on public PE datasets, demonstrates that the proposed method achieves enhanced segmentation performance and generalization capability, when compared to state-of-the-art methods.

16:30
Weakly supervised pleural plaque segmentation using global patient-level diagnostic cues

ABSTRACT. AI-driven approaches have been proposed for pleural plaque (PP) segmentation from computed tomography (CT) scans, aiming to produce voxel-wise binary masks. While these models show strong potential for reproducible PP segmentation, they often struggle to capture small, thin, or morphologically variable plaques. Furthermore, models trained predominantly on PP-positive cohorts, without inclusion of healthy controls, tend to exhibit local bias and limited generalization capability. This study investigates whether incorporating simple, globally accessible patient-level information (specifically, the radiologist assessed presence or absence of PP) can enhance segmentation performance. We propose a framework that augments a pre-trained segmentation model with a lightweight deep correction module that leverages global diagnostic information to refine local PP segmentation outputs. The results demonstrated the framework’s ability to improve the reliability of traditional segmentation tools for the automated assessment of PP disease. This was achieved by leveraging globally accessible patient-level information, rather than relying on labor-intensive local delineations at the individual plaque level.

16:45
Automated Trachea Segmentation from CT Imaging Using AI Models

ABSTRACT. Accurate trachea segmentation from computed to- mography (CT) is a prerequisite for image-guided airway assess- ment, precision tracheostomy planning, and safe endotracheal tube placement. The trachea presents distinct segmentation challenges due to its elongated morphology, small cross-sectional area, sensitivity to partial-volume effects, motion artifacts, and heterogeneous surrounding mediastinal structures. This study systematically compares two complementary paradigms: a fully automatic self-configuring framework (nnU-Net) and a prompt- conditioned foundation model (MedSAM) derived from the Seg- ment Anything Model (SAM). Evaluation is performed under heterogeneous dataset regimes including volumetric CT data with consistent inter-slice continuity and slice-based CT data lacking reliable volumetric structure. A hybrid inference strategy enabling automatic prompt generation is introduced. Quantita- tive and qualitative analyses demonstrate that dataset structure critically influences segmentation reliability, boundary stability, and deployment feasibility in precision airway workflows.

17:00
SpineContextResUNet: A Computationally Efficient Residual UNet for Spine CT Segmentation

ABSTRACT. Automated segmentation of the vertebral column in Computed Tomography (CT) scans is a prerequisite for pathological assessment and surgical planning. However, state-of-the-art methods, particularly those based on Transformers or large-scale ensembles, demand substantial GPU resources, creating a barrier for clinical adoption in resource-constrained environments or on edge devices. To address this, we introduce SpineContextResUNet, a computationally efficient 3D Residual U-Net designed for rapid spinal localization. Our architecture integrates a lightweight Context Block that employs parallel multi-dilated convolutions to capture long-range anatomical dependencies without the high latency of Recurrent Neural Networks (RNNs) or the memory overhead of Self-Attention mechanisms. Extensive validation on two public benchmarks, VerSe2020 and CTSpine1K, demonstrates that our model achieves a Dice score of 88.17% and 88.13% respectively. To evaluate performance under strict hardware constraints, we compared our model against a bottlenecked SwinUNETR scaled to match our 1.7M hardware footprint. While the constrained Transformer suffers severe performance degradation due to a lack of spatial inductive biases in a limited-data regime, our CNN-based approach successfully maintains high accuracy. Crucially, heavy baselines like TotalSegmentator fail due to memory exhaustion on commodity hardware (Intel Core i5, 8GB RAM), our model performs robust inference, making it a viable solution for point-of-care diagnostics and deployment on edge platforms like the Nvidia Jetson Orin Nano.

17:15
GAN-based Adaptive Radial Subsampling and Reconstruction for Brain MRI

ABSTRACT. Magnetic Resonance Imaging (MRI) is one of the main medical imaging modalities. Radial k-space sampling, due to its inherently dense coverage of the central k-space region, is particularly effective in mitigating motion-related artifacts through averaging effects, and consequently, is frequently used in MRI. However, fully sampled images could take a long acquisition time, thus necessitating undersampling. Undersampling is com- monly implemented randomly and blindly, in accordance with the requirements of Compressed Sensing (CS), often resulting in blurred images and a low compression ratio. The amount of data required by CS can be reduced, and the quality of image reconstruction improved if a-priori information about the underlying image is collected during the sequential acquisition process. In this work, we introduce a Generative Aversarial Neural Network (GAN)-based Adaptive Radial Subsampling and Reconstruction (GASSR) for brain MRI, an iterative adaptive acquisition/reconstruction technique for radial sparse sampling, capable of collecting the minimal and most informative set of radial trajectories by alternating iterative sampling, reconstruction, information evaluation, and the acquisition of new radial directions, based on the information content of the reconstructed image. Preliminary results indicate that GASSR effectively reduces data redundancy and achieves rapid convergence to high-quality images, outperforming similar SOTA models.

16:00-17:30 Session 6C: Emerging Computational Intelligence Methods in Healthcare
Location: Atrium B
16:00
Comparative Performance Evaluation of Contemporary Video Coding Standards, including DCVC-RT and ECM, on 2D and 360° Medical Video Datasets

ABSTRACT. Medical video streaming is increasingly important to digital healthcare systems and services, including remote diagnosis, immersive training and virtual reality (VR) simulations. In these applications, both compression efficiency and delivery latency must be carefully evaluated as they may affect diagnostic confidence and the fidelity of medical education. This paper presents a comparative performance evaluation based on objective video quality assessment metrics of encoding efficiency and speed, of representative conventional, enhancement-based and neural video codecs across two heterogeneous medical video datasets. More specifically, a low-resolution 2D ultrasound (560x448,40fps) and a high-resolution 360o emergency simulation dataset (8K-class, 7860x3840, 30fps). The study includes seven video encoding standards, namely DCVT-RT, ECM, VVC, SVT-AV1, x265, hevc_nvenc and lcevc_hevc. Objective video quality is assessed using PSNR, SSIM AND VMAF, while encoding efficiency is assessed using BD-Rate analysis. The findings show that ECM, tested on low-resolution ultrasound videos, outperforms all other encoders, supporting its potential as a foundation for next-generation H.267 standardization. In addition, the neural video codec DCVC-RT demonstrates particularly strong encoding performance on the high-resolution 360o dataset, where ECM and VVC were not evaluated. Leveraging GPU acceleration, DCVC-RT, achieves both high compression efficiency and fast encoding, making it a suitable solution for real-time encoding. Furthermore, the GPU accelerated hevc_nvenc implementation, significantly improves time efficiency compared with conventional encoders, highlighting practical trade-offs between compression gains and encoding time. Overall, the results confirm the practical relevance and suitability of neural video compression and next-generation coding technologies for real-time medical video streaming applications.

16:15
Explainable temporal analysis for high-risk carotid plaques using ArgEML – a pilot study

ABSTRACT. Advancing recent research in Explainable AI for carotid plaque risk assessment, this pilot study applies the Argumentation-based Explainable Machine Learning (ArgEML) framework to investigate the temporal prediction of ischemic events in symptomatic patients. While prior research identified high-risk asymptomatic plaques, this work shifts focus to the clinical necessity of distinguishing between ‘early stroke’ and ‘late stroke’ outcomes to optimize surgical intervention windows. By integrating sub-symbolic machine learning with symbolic logical argumentation, the study extracts decision rules from ultrasonic plaque features and generates a transparent argumentation theory. By utilizing the Explanation Space, the model identifies clinical dilemmas where evidence for stroke timing is ambiguous, providing transparent, human-like justifications. This approach aims to help clinicians identify patient profiles requiring further diagnostic analysis, ultimately fostering the trust and accountability necessary for AI adoption in high-stakes healthcare. The ArgEML learned theory demonstrated predictive reliability comparable with a statistical machine learning model. Future work will focus on validating this approach with clinicians using larger cohorts.

16:30
Controlled Factorial Analysis of Architecture–Loss Interactions in Cardiac MRI Segmentation

ABSTRACT. Cardiac MRI segmentation is essential for quantifying cardiac structure and function. The ACDC dataset [3] provides a standardized benchmark for this task. While CNNs like U-Net are widely used, transformer-based models such as TransUNet have emerged as alternatives. However, controlled comparisons under identical conditions remain lacking, making it unclear whether performance gains arise from architectural complexity or loss function design. This study conducts a controlled 2 by 6 factorial experiment comparing U-Net and TransUNet across six loss configurations on the ACDC benchmark, with five random seeds per condition and identical preprocessing, augmentation, and optimization protocols. Region-aware losses performed significantly better than CE in both architectures, with mean Dice improvements of 0.5-1.5 percentage points. Under CE loss, U-Net achieves slightly higher Dice scores than TransUNet; however, this gap narrows with advanced losses and becomes non-significant after correction (e.g., Region loss raw p = 0.071, adjusted p = 0.428). Two-way ANOVA reveals that loss function explains a larger proportion of variance than architecture on both validation (η2 = 0.908 vs. 0.512) and test (η2 = 0.427 vs. 0.277) sets, with significant interaction (p = 0.019) indicating optimal loss depends on architecture choice. Notably, a three-fold increase in model parameters (U Net: 32.5M vs. TransUNet:105.3M) yields gains comparable to those achievable through loss function design, suggesting that architectural complexity alone does not guarantee improved performance on small-to-moderate datasets. However, Hausdorff-based losses increase training time by approximately 3-10 times without proportional performance gains, highlighting an efficiency-accuracy trade-off. These findings suggest that loss function design warrants attention comparable to architectural innovation for cardiac MRI segmentation on small-to-moderate datasets.

16:45
C3MG : Clinically-Controlled Cardiac Mesh Generator based on Rectified Flow Matching

ABSTRACT. One of the biggest challenge to train robust deep learning models in medical imaging is the acquisition of high-quality 3D cardiac segmentation labels. Full volumetric annotations are costly, time consuming, and often limited across cardiac pathologies, blocking the development of generalizable solutions. To address this issue, we propose a novel text-conditioned framework for generating realistic 3D multi-label cardiac segmentation masks directly from clinical descriptions. Our approach adapts CogVideoX, a state-of-the-art text-to-video diffusion model with an expert transformer to operate in a volumetric segmentation domain, enabling the synthesis of anatomically coherent 3D masks from natural language prompts such as “dilated left ventricle” or “right ventricular hypertrophy.” By treating 3D segmentation volumes as pseudo video sequences, the model learns to translate semantic indicators of pathology into plausible geometric representations. This work opens new perspectives in the generation of text-based, anatomy-aware cardiac data and establishes the feasibility of bridging clinical language and 3D morphology.

17:00
A biologically-inspired computer vision pipeline for neonatal oral motor assessment

ABSTRACT. The automatic recognition of neonatal oral motor patterns may provide useful information about their health and neurodevelopmental status. However, the distinctive morphology of newborn faces, together with their high movement variability, poses significant challenges for computer vision analysis. In this study, we propose a biologically inspired computer vision pipeline for the automatic classification of neonatal mouth poses from video recordings. The proposed methodology includes structured preprocessing, representation learning, and supervised classification. First, the video frames were processed to localise and normalise the mouth region. This was obtained by employing (i) face and landmark detection, (ii) rotation alignment, and (iii) photometric standardisation. Then, a Gabor+Sobel filter based convolutional autoencoder was trained to extract compact latent representations. Each frame was encoded into a 512 dimensional feature vector, which was then used to train a multilayer perceptron classifier to discriminate between mouth closed, mouth open, and tongue protrusion. The autoencoder achieved a mean Structural Similarity Index Measure (SSIM) of 0.97 on unseed data, indicating effective preservation of key structural facial information. The performance of the multilayer perceptron model was evaluated using a leave-one-patient-out cross-validation approach on a dataset comprising 9912 frames from 8 newborns, including 6 preterm and 2 full-term infants. The Global Balanced Accuracy was 0.83, with a comparable macro F1 Score. These findings suggest that the use of biologically inspired feature-extraction methods can facilitate the capture of pertinent morphological characteristics of neonatal faces and thereby provide informative representations for automated behavioural analysis.

16:00-17:30 Session 6D: Intelligent Sensing
Location: Atrium C
16:00
XRaudinary: Spatially Anchored Live Captions in XR via Low-Cost Microphone Arrays and Vision Fusion

ABSTRACT. People with hearing loss often face high cognitive load and reduced participation in group conversations, especially in noisy or reverberant environments. We present XRaudinary, an in-progress XR system that spatially anchors live captions to the direction of sound within the user's vision, by combining a wearable, low-cost microphone array with the user's real-time vision. An ESP32 Microcontroller Unit (MCU) ingests synchronized I2S MEMS microphone streams, forwarding them to a server that estimates time-differences-of-arrival for direction-of-arrival inference, and forwards post-processed directionality and captions to a VR/AR headset. Constraining the sound along the planar axis of the user's vision resolves geometric ambiguities inherent to small arrays, enabling real-time captions that appear at the correct location in the user's field of view. We describe the system architecture, sound source localization approach, and appropriateness for real-time conversational contexts.

16:10
Geometry-Aware Event-Based TinyML for Interpretable On-Device Rehabilitation Metrics

ABSTRACT. Wearable rehabilitation systems increasingly rely on inertial sensors and machine learning (ML) to monitor posture and movement quality. However, most existing solutions execute inference continuously or apply independent scalar thresholds, limiting battery life and failing to capture the coupled dynamics of clinically valid motion patterns. We propose a geometry-aware, event-driven TinyML framework in which biomechanical deviations are modeled as compact regions in joint-angle and angular-velocity space. Instead of continuous inference, computation is triggered only when the kinematic state exits this admissible region. A prototype implementation on an STM32L4 microcontroller using TensorFlow Lite Micro demonstrates a 72.7% reduction in inference calls and a 37.3% reduction in average current consumption. The system preserves interpretable rehabilitation metrics while operating fully on-device with minimal computational overhead, supporting long-term autonomous wearable operation.

16:20
Motion Compensation for Ultrasound Scanning in Robotically Assisted Prostate Biopsy Procedures

ABSTRACT. Prostate cancer is one of the most common types of cancer in men. Diagnosis via biopsy requires a high level of surgical expertise and precision, making the results highly operator-dependent. The aim of this work is to develop a robotic system for assisted ultrasound (US) examination of the prostate, a prebiopsy step that reduces dexterity requirements and enables faster, more accurate, and more accessible prostate biopsy. We developed and validated a laboratory setup with two robots: one that autonomously scans a prostate phantom and another that carries the phantom to simulate patient movement. The scanning robot maintains the relative position of the US probe and the prostate phantom, ensuring a consistent and robust approach to reconstructing the scanned prostate. To reconstruct the prostate, each slice is segmented to generate a series of prostate contours, which are converted into a 3D point cloud used for biopsy planning. The average scan time of the prostate was 30 s, and the average 3D reconstruction time was 3 s. We performed three motion scenarios and registered them with the stationary case. ICP registration with a threshold of 1.2 mm yielded a mean fitness of over 90% for each motion type. Due to the elastic and soft material properties of the prostate phantom, the maximum robot tracking error was 3 mm, which is considered sufficient for prostate biopsy according to medical literature.

16:30
Contactless Blood Pressure Estimation via Smart Mirror using Pulse Transit Time

ABSTRACT. As population aging accelerates, modern technologies for health monitoring and ambient assisted living (AAL) are becoming increasingly integrated in everyday life. However, many individuals -particularly older adults- experience difficulties and discomfort in using such technologies. For this reason contactless techniques for measuring and monitoring vital signs, that rely on user-friendliness and unobtrusiveness are being actively developed. In this context, a smart biomedical mirror for the remote monitoring of vital signs was developed as a potential solution. Building on this work, we implemented a remote blood pressure estimation algorithm using the system’s existing remote photoplethysmography (rPPG) pipeline. The preliminary evaluation demonstrated the feasibility of integrating remote blood pressure estimation within the smart mirror platform using the existing rPPG pipeline. The obtained results provide an initial proof-of-concept and highlight important considerations for improving camera-based blood pressure estimation in future work.

16:40
BlinkIQ: A Real-Time Landmark-Based Framework for Robust Eye Blink Analytics and Digital Biomarker Extraction

ABSTRACT. Eye blink rate is a promising non-invasive biomarker linked to attention, fatigue, and neurocognitive functioning. This paper presents BlinkIQ, a real-time framework for eye blink detection and digital biomarker extraction from RGB video using MediaPipe facial landmarks. The system estimates eye openness through a normalized geometric ratio and applies temporal smoothing and rule-based validation to detect blink events robustly while reducing false positives. In addition to blink detection, BlinkIQ extracts clinically relevant features such as blink count, blink rate, blink duration, and inter-blink interval variability. Its modular and low-cost design enables deployment with standard cameras in both real-time and offline settings. BlinkIQ offers an efficient and interpretable approach for continuous blink monitoring, with potential applications in fatigue assessment, cognitive monitoring, and digital health.

16:50
A Highly Customizable Platform Enabling Sophisticated Medical Eye-Tracking Studies

ABSTRACT. Eye-tracking has demonstrated significant potential for analyzing visual attention in medical research. Nevertheless, technical challenges persist in establishing robust and customizable study designs ensuring reproducible data collection in complex scenarios. In prior work, we proposed a vendor-agnostic eye-tracking platform for conducting studies in digital (neuro)pathology. However, the initial platform was validated in an expert-supervised setting and lacked systematic evaluation of interaction design, user onboarding, and data quality mechanisms. Additionally, it exhibited a high degree of specificity, being closely associated with a particular medical domain. In order to address these limitations, a comprehensive set of cross-domain requirements was elicited and incorporated into a requirement-driven redesign resulting in a refined modular eye-tracking platform that was subsequently implemented and empirically validated. The redesigned platform contributes configurable multi-class task handling, flexible study flow components, and integrated ground truth–based feedback mechanisms. The platform was evaluated with 14 physicians from dermatology performing multi-class wound classification under eye-tracking conditions. Quantitative fixation analyses with respect to labeled regions of interest were combined with standardized usability assessment using the System Usability Scale (SUS). Findings indicate the system's dependable performance in a clinical setting and its exceptional usability, as evidenced by an average SUS score of 83.75. The findings of this study indicate that the refined platform enables standardized, reproducible, and domain-agnostic eye-tracking studies while significantly lowering technical barriers for study execution in clinical research environments.