IVCNZ 2025: IMAGE AND VISION COMPUTING NEW ZEALAND
PROGRAM FOR FRIDAY, NOVEMBER 21ST
Days:
previous day
all days

View: session overviewtalk overview

09:00-10:00 Session 7: Keynote 2
Location: RHLT 1
09:00
Solving occlusion for complete plant models with accurate 3D metrics

ABSTRACT. ABSTRACT. We cannot automate what we cannot see – and COVID saw fruit and vegetables rotting on the ground because we were unable to automate harvesting in such challenging complex outdoor environments.

Our recent breakthroughs with NeRF and Gaussian splatting focus on solving such leaf occlusion to enable sub-mm 3D metrics of all fruit (and other plant organs) and complete & correct structure of branching – all this from images of windblown plants. This also enables rapid data reduction of petabytes of data from scanning a farm (such as orchards or vineyards) from hundreds of images per plant. We will also discuss the challenges that remain and so propose potential directions for future work.

10:00-10:30Coffee Break
10:30-12:30 Session 8: Paper Session: Marine & Environmental AI
Location: RHLT 1
10:30
Deep Learning Based Crab Classification For Marine Pest Monitoring

ABSTRACT. New Zealand’s coastal ecosystems are increasingly threatened by invasive marine species, which endanger biodiversity and disrupt local fisheries. The Asian Paddle Crab (\textit{Charybdis japonica}), in particular, is an aggressive invader requiring accurate and early identification to support biosecurity. However, such species classification remains challenging due to high visual similarity and the lack of dedicated datasets. To address this gap, we present the New Zealand Pest Crab Classification Dataset (NZPCCD), a fine-grained benchmark designed for ecological crab pest monitoring. The dataset comprises high-quality, annotated images of visually similar crab species, enabling robust training and evaluation of automated classification models. We further developed a classification-adapted YOLO model based on our dataset for crab pest classification and achieve superior classification performance compared with other deep learning models, offering a compelling balance of accuracy and efficiency.

10:45
Semi-Supervised Seafloor Habitat Classification: A Pseudo-Labeling Framework

ABSTRACT. Seafloor habitat classification plays an important role in marine ecology research, but manual labeling of underwater images is costly and time-consuming. This study explores the feasibility of applying semi-supervised learning using the pseudo-labeling strategy to improve classification performance under limited labeled seafloor data. The supervised stage begins with training on a small labeled dataset. The pseudo-labeling method is then applied using class-specific confidence thresholds. A class-balanced top-k selection strategy is employed to iteratively expand the training set with high-confidence unlabeled samples. Our findings show that the proposed semi-supervised framework effectively leverages unlabeled data to improve classification performance, even when applied to a combined dataset of imagery from two geographically distinct regions.

11:00
MFU-Net: A Novel Deep Learning Framework for Unmixing Method for Sentinel-2 Imagery of Invasive Serrated Tussock (Nassella trichotoma)

ABSTRACT. Grassland reserves are vital for biodiversity and climate mitigation. Remote sensing imagery is increasingly being used to monitor vegetation health and invasive species in these vast areas. However, the spectral sparsity and mixed pixels of Sentinel-2 limit its capacity for accurate unmixing. Existing linear and deep learning methods, designed for hyperspectral imagery, cannot fully utilise the limited spectral bands and spatial information of Sentinel-2, leading to poor unmixing results. In this paper, we propose the Multi-Feature Unmixing Network (MFU-Net) - a deep learning architecture designed for Sentinel-2 unmixing. MFU-Net employs a dual-branch network that fuses spatial and spectral features through a CBAM attention mechanism and an iterative Attentional Feature Fusion (iAFF) block for adaptive integration. Our results show that MFU-Net outperforms the linear model, reducing RMSE by about 30.3% while improving R² from 0.0046 to 0.5234. MFU-Net also outperforms a state-of-the-art deep learning method: the Multi-Feature Extraction, reducing RMSE by 2.56% and improving R² by 5.16%. Ablation study also confirms that spectral CBAM and iAFF module are key contributors to the relatively improved performance.

11:15
Tool-Assisted Annotation of Seafloor Sediment-linked Features Using Weakly Supervised Semantic Segmentation

ABSTRACT. Pixel-wise labeling of seafloor imagery is highly time-consuming, limiting the scalability of benthic habitat monitoring. While existing underwater computer vision research has largely focused on visually prominent habitats, sediment-linked benthic features remain underexplored and lack annotated datasets. To address this gap, we propose a tool-assisted annotation framework based on weakly supervised semantic segmentation. The framework follows a three-phase pipeline: feature-based pseudo-mask generation, binary class-specific segmentation, and iterative multiclass segmentation with pseudo-mask expansion and affinity-field regularization. Applied to six ecologically important sediment features, the approach progressively improves pseudo-label quality while reducing reliance on dense expert annotations, providing a scalable solution that can accelerate the annotation of seafloor sediment features.

11:30
Deforestation Monitoring for Mongolia’s Forest-Steppe Ecoregion Through Satellite Images

ABSTRACT. Satellite imagery has emerged as a powerful tool for large-scale land monitoring, but platform incompatibilities, computational limitations, and massive data volumes hinder its full potential. This study presents a novel cloud-based machine learning framework for forest cover monitoring in Mongolia’s ecologically unique forest-steppe region. Unlike traditional approaches, our method integrates a hybrid unsupervised-supervised learning pipeline within the Copernicus Data Space Ecosystem, combining K-means clustering with optimized Random Forest classification for enhanced feature representation. We process Sentinel-2 imagery through an advanced preprocessing workflow that employs pixel-level filtering and quartilebased noise reduction, specifically designed to address Mongolia’s challenging monitoring conditions characterized by snow cover and seasonal variability. Our framework achieves 99% classification accuracy and 92% validation against Hansen Global Forest Change data, surpassing the Copernicus baseline by 15%. The approach demonstrates significant improvements in computational efficiency, processing times, and adaptability to fragmented forest landscapes compared to existing methods.

11:45
End-to-End Automated Screening of Lordosis-Kyphosis-Scoliosis and Vertebral Compression in Salmon X-Ray Images

ABSTRACT. Automated health assessment is important for efficiently managing spinal deformities, such as lordosis-kyphosis-scoliosis (LKS) and vertebral compression, in farmed salmon. In this study, we present a multi-stage deep learning pipeline for vertebral detection and severity classification in salmon X-ray images. Following initial image preprocessing and segmentation via a U-Net model, vertebrae are detected and localized using an enhanced YOLOv12 object detection framework. YOLOv12 high accuracy, stable convergence, and a faster inference compared to prior YOLOv10 version. LKS severity is then assessed through a rule-based strategy using Cobb angle measurements derived from the detected vertebrae, enabling objective anatomical characterization across different regions of the vertebral column. For compressions, statistical and morphological features are extracted and classified using a suite of machine learning models, with performance optimized by feature selection and class balancing techniques. The proposed system achieves robust detection and disease classification across both normal and pathological regions, showing reliable performance even in anatomically challenging cases. Designed for integration into industrial aquaculture workflows, the pipeline enables high-throughput, automated screening of large fish populations to support timely, data-driven health management at scale. This integrated approach enhances the scalability and accuracy of disease monitoring in aquaculture, and thus fish welfare and farm productivity.

12:00
Estimation of Object Volume in Aqueous Food Media using Surface Electric Potential and Neural Network Regression

ABSTRACT. Filter traps are essential for removing unwanted foreign objects from food media during production, ensuring food safety for consumers. In dairy processing plants, these filters are cleared periodically to ensure normal production operations, without a feedback mechanism that informs the condition of the filters. This paper introduces a new approach to estimate the volume of foreign objects in aqueous food media using non-invasive surface electric potential. The real-time volume estimation allows the approximation of the accumulated object volume passing through a section of a milk processing pipe and informs the need to initiate the filter clearing process. The sensing principle is based on the conductivity difference between foreign objects and food media, which exhibits distinctive signatures measurable by the sensing system. Experimental results demonstrate that a lightweight neural network-based regression model can learn the complex mapping between low-resolution surface potential data and target object volume, despite the presence of signal ambiguity and noise interference. Model evaluation on simulated unseen sphere objects shows a 4 % mean absolute percentage error, and less than 6 % percentage error tested on real stainless steel sphere objects in a laboratory setting.

12:15
On the impact of natural guide-star asterism geometry on atmospheric tomography

ABSTRACT. When imaging a target object in space, atmospheric distortion degrades the angular resolution of an imaging system. Adaptive optics is a method of reducing this degradation by estimating the aberration from the light of a nearby bright guide- star and correcting for the wavefront distortion. Atmospheric tomography can improve the images of astronomical objects that are outside the effective angular distance to a sufficiently bright guide-star. It reconstructs the target wavefront distortion using the wavefront estimates of several guide-stars. The spatial distribution of the guide-stars has a significant impact on the reconstruction performance. Currently, when studying tomog- raphy methods, a circular asterism is normally assumed. This holds for laser guide-stars but has limitations when applying atmospheric tomography to natural guide-stars, which are very rarely arranged as a perfect circular asterism with the astronomy target at the centre. This paper uses an empirical approach to investigate the effect of asterism geometry on atmospheric tomographic reconstruction performance. Specifically, several models are developed and compared against one another in terms of predicting the tomographic performance from real- world guide-star coordinates. We find that using support vector regression or a small neural network is sufficient to accurately predict asterism quality using only features calculated from the guide-star geometry (R2 = 0.951), eliminating the need to compute the projection matrices in real-time for asterism quality evaluation.

12:30-13:30Lunch & Steering Committee Meeting
13:30-15:30 Session 9: Paper Session: Medical & Biomedical Imaging & Multimodal AI
Location: RHLT 1
13:30
Echocardiogram to CMR Image Synthesis using Generative Models

ABSTRACT. Echocardiograms provide noninvasive real-time data for assessing the structure and function of the heart and can assist in diagnosing several conditions. However, they are highly operator-dependent, often yield incomplete or suboptimal views, and can be challenging to interpret. In contrast, Cardiac Magnetic Resonance (CMR) delivers comprehensive and detailed evaluations but remains time-consuming and costly. To address these limitations, this study investigates cross-modal generative modeling for synthesizing CMR sequences directly from 2D Transthoracic Echocardiography (TTE) with temporal information. We propose a novel model that combines an autoencoder (AE) backbone for feature extraction with a vision transformer (ViT) to capture global temporal and spatial dependencies, thereby enabling the prediction of CMR sequences with preserved dynamics. The performance of this architecture is compared with alternative generative models to assess quantitative accuracy. Experimental results show that the proposed ViT-AE model with 12 layers achieved the best performance, with an MAE of 0.08, an SSIM of 0.67, and a PSNR of 18.45.

13:45
(Online talk) Progressive Distillation Attention for Robust Left Ventricular Ejection Fraction Estimation

ABSTRACT. Near accurate estimation of the left ventricular ejection fraction (LVEF) from echocardiography is essential for the assessment of cardiovascular risks, but remains highly dependent on expert interpretation. We propose a lightweight encoder–decoder architecture with attention-guided skip refinements for robust left ventricle segmentation and automated LVEF computation from echocardiogram video and image samples. Unlike conventional U-Net models, our approach integrates dense connections, feedback loops, and Progressive Distillation Attention Blocks (PDABs). The PDAB modules selectively refine shallow encoder features before fusion with decoder representations, enabling progressive distillation of fine spatial details and deeper semantic cues. This refinement strategy ensures the precise delineation of ventricular boundaries even in low-quality echocardiographic frames, directly improving the reliability of the estimation of ejection fraction. With only 2.4M parameters, our model outperforms prior state-of-the-art methods. In the EchoNet-Dynamic dataset, it achieves a 15.85% improvement in R2 with 69.4% and 4.6% reductions in MAE and RMSE, respectively. In the CAMUS dataset, it reduces MAE by 79.06% while improving Pearson’s correlation coefficient and R2 by 19.51% and 1.02%, respectively. These results demonstrate that the PDAB-enhanced method provides a compact and accurate solution for automated LVEF estimation from echocardiography.

14:00
Nissl-stained Histological Slice Image Completion Based on Generated Masks

ABSTRACT. In neuroanatomy, studying histological brain sections cut from non-human primate brains provides valuable insights into how the human brain operates, as they share structural and functional similarities. However, acquiring histological brain sections of a primate animal can be challenging; inevitable human errors may lead to the damage of the histological brain sections, making the samples harder to interpret. Therefore, histological image completion plays a critical role, which paves the road to repairing the damaged slice in the digitalized form. In this study, we used the common marmoset monkey (Callithrix jacchus), which is a New World monkey, sharing similar brain functionality with humans. Two state-of-the-art models were tested and compared, namely Deep Fusion Network (DFNet) and Pluralistic Image Completion with Reduced Information Loss (PUT). These models were first introduced for scenery and face image completion, and have not been applied to histological images of the brain. DFNet achieved a mean squared error of 0.0061±0.0025 with the squared mask, while the irregularly shaped mask achieved 0.0361±0.0095. For the PUT model, the mean squared error reached 0.0105±0.0035. These results demonstrate that both models can successfully reconstruct missing regions in Nissl-stained brain slice images, contributing to the development of future digital histological repair pipelines.

14:15
Seeing Beyond the Airways: Asthma Prediction via Cross-Attention on Dual Retinal Modalities

ABSTRACT. Asthma’s systemic effects and chronic underdiagnosis trigger avoidable exacerbations and a heavy healthcare burden. Effort-dependent spirometry and specialist-only tests block population-scale screening. We propose a cross-modal attention framework for non-invasive asthma classification from dual retinal modalities, colour fundus photographs (CFP; Type 1: posterior pole view; Type 2: optic nerve head view) and optical coherence tomography (OCT) measurements, where systemic changes manifest in retinal structure and vasculature. Fundus images are encoded by a CNN backbone followed by multi-head self-attention (MHSA); OCT metrics are embedded by a lightweight feed-forward encoder; cross-modal attention (CMA) then fuses the two streams to capture intermodal dependencies. Trained and evaluated on a novel dual-modality dataset curated at UNSW Centre for Eye Health (CFEH), the model achieves an AUC of 0.97 and offers improved interpretability via attention-weight visualisations. These results support retinal biomarkers as a scalable pathway for early asthma detection and open a window to population-level oculomic screening for other systemic diseases (e.g., neurodegenerative and cardiovascular diseases), highlighting the promise of CMA for ocular imaging.

14:30
(Online talk) Enhanced Graph Convolutional Network with Chebyshev Spectral Graph and Graph Attention for Autism Spectrum Disorder Classification

ABSTRACT. Autism Spectrum Disorder (ASD) is a complicated neurodevelopmental disorder marked by variation in symptom presentation and neurological underpinnings, making early and objective diagnosis extremely problematic. This paper presents a Graph Convolutional Network (GCN) model, incorporating Chebyshev Spectral Graph Convolution and Graph Attention Networks (GAT), to increase the classification accuracy of ASD utilizing multimodal neuroimaging and phenotypic data. Leveraging the ABIDE I dataset, which contains resting-state functional MRI (rs-fMRI), structural MRI (sMRI), and phenotypic variables from 870 patients, the model leverages a multi-branch architecture that processes each modality individually before merging them via concatenation. Graph structure is encoded using site-based similarity to generate a population graph, which helps in understanding relationship connections across individuals. Chebyshev polynomial filters provide localized spectral learning with lower computational complexity, whereas GAT layers increase node representations by attention-weighted aggregation of surrounding information. The proposed model is trained using stratified five-fold cross-validation with a total input dimension of 5,206 features per individual. Extensive trials demonstrate the enhanced model’s superiority, achieving a test accuracy of 74.82% and an AUC of 0.82 on the entire dataset, surpassing multiple state-of-the-art baselines, including conventional GCNs, autoencoder-based deep neural networks, and multimodal CNNs.

14:45
3D Gaussian Splatting Reconstruction from Simulated CT Projections with Geometric Initialization

ABSTRACT. We develop a customized initialization method for 3D Gaussian Splatting methods, aimed at extending its application to Computed Tomography (CT) reconstruction. Initialization in 3D Gaussian Splatting is a crucial step and can be accomplished using several techniques. The official pipeline of 3D Gaussian Splatting uses the Structure-from-Motion (SfM) technique in its initialization step. While SfM works well for natural scene photographs, it is not directly applicable in the medical domain, more specifically in the CT environment. To address this limitation, we propose a customized, geometric-aware initialization method that is compatible with parallel beam CT geometry. We investigated 16 simulated CT datasets along with the Shepp-Logan phantom. These simulated models were acquired from TomoPhantom toolbox that provided 2D projection images as the ground truth. These ground truth images and the 3D models were used in our customized 3D Gaussian placement strategy, ensuring accurate camera orientation and 3D point sampling for parallel-beam CT reconstruction. We obtained the rendered images corresponding to their ground truth projections that mostly preserved the true geometric structures. For the Shepp-Logan phantom, we achieved a test PSNR of 29.987 and an L1 loss of 0.015 after 30,000 iterations. Further work may extend this approach to real-time CT data with different scanner acquisition, such as cone beam or helical.

15:00
Multimodal Cross-Attention for Range of Motion Assessment

ABSTRACT. Accurate and automated Range of Motion (ROM) assessment is essential for rehabilitation, physical therapy, and post-surgical recovery. Traditional manual goniometer-based evaluations suffer from subjectivity, inter-rater variability, and reliance on trained professionals. Although RGB-based computer vision enables markerless ROM estimation, it remains susceptible to occlusions, pose estimation errors, and difficulties detecting subtle joint movements. Furthermore, vision alone cannot capture neuromuscular activation, which is crucial for understanding joint dynamics. To address these challenges, we propose a multi-modal deep learning framework that integrates RGB-based motion tracking with electromyography (EMG) signals. EMG provides neuromuscular activation data, enhancing the robustness against visual occlusions and improving sensitivity to subtle joint displacements. Our method employs an Hourglass-based convolutional neural network (CNN) for spatial feature extraction and a gated recurrent unit (GRU)-based model for temporal EMG processing. To further enhance performance, we introduce feature-level and modality-level attention modules, dynamically emphasizing the most informative features and modality contributions. Experimental results demonstrate that our proposed model achieves an overall RMSE of 2.55, and improvements via the feature and modality attention mechanisms, respectively. Moreover, the fully fused RGB-EMG model outperforms RGB-only approaches, particularly in accurately predicting subtle ROM movements.

15:15
Risk-Controlled Multimodal Emotion Coaching for Autism Support Using Self-Supervised Vision and Speech Encoders

ABSTRACT. Challenges in social communication and emotion recognition characterize autism spectrum disorder (ASD). However, many existing digital interventions often fail, lacking the multimodal, adaptive, and safety-conscious frameworks necessary for adequate real-world support. This paper introduces a risk-controlled multimodal emotion coaching (RC-MEC) system designed to provide personalized and safe affective learning for individuals with autism. Our framework combines vision and speech using self-supervised learning (SSL) to capture rich, contextualized representations. We validated RC-MEC on the FER-2013 facial emotion and RAVDESS speech datasets. At the core of RC-MEC is a risk-control module, driven by conformal prediction, which dynamically regulates the agent’s action. Interventions are delivered only within a predefined low-risk confidence limit, achieving 90.1% coverage at the target risk level or error tolerance (α) = 0.1 and 92.2% singleton accuracy when confident. The results confirm that the proposed multimodal and risk-aware system offers a safer, more effective, and reliable tool for emotion coaching in autism, thereby paving the way for more responsible and user-centered assistive technologies.

15:30-16:00Coffee Break
16:00-18:00 Session 10: Paper Session: Robotics, Agriculture and Other Applications
Chair:
Location: RHLT 1
16:00
Mobile Robot Navigation Method based on Multiple External Cameras in Crowded Environment

ABSTRACT. Existing navigation approaches for mobile robots in crowded environments predominantly rely on on-board sensors like LiDAR and monocular cameras, suffering from limited sensing coverage and occlusion issues that hinder comprehensive perception of dynamic surroundings. This paper presents a novel navigation framework leveraging a multi-camera system deployed in the environment to enable holistic environmental perception and robust robot navigation. The framework introduces a Generalized Multi-View Detection (GMVD) algorithm with learnable adaptive projection and dynamic view fusion, which uses markers to assist in robot localization. The navigation layer integrates an improved A* algorithm with a hierarchical strategy combining speed barriers and dynamic window approaches to achieve collision-free path planning. Real-world experiments comparing the proposed method with previous crowd navigation algorithms demonstrate that it significantly enhances the robot’s navigation performance, generating obstacle-free paths for safe and efficient navigation in crowded scenarios.

16:15
SAM-Based Leaf Segmentation with Morphological Quality Assessment for Enhanced Plant Disease Detection

ABSTRACT. Plant diseases threaten global food security through substantial agricultural losses, yet traditional visual inspection often fails to detect early-stage symptoms critical for timely intervention. While deep learning models have shown promise for automated disease detection, their performance often degrades in realistic field conditions. This study investigates whether data-centric preprocessing can improve computer vision performance for apple leaf disease detection. We present the first systematic evaluation of the Segment Anything Model (SAM) combined with morphological quality assessment for leaf segmentation, compared against whole-image classification using the PlantPathology FGVC7 dataset (3,642 apple orchard images). To ensure segmentation reliability, we introduce a five-metric morphological framework (area ratio, aspect ratio, spatial coverage, centroid proximity, border penalty). Experiments with ResNet-18 under 3-fold cross-validation reveal class-specific effects: SAM improves F1 by 3.0% for the minority multiple diseases class, but decreases by 1.4% for healthy leaves where contextual cues aid detection. Rust and scab remain stable above 95% F1, reflecting their distinctive visual signatures. GradCAM++ confirms that preprocessing redirects attention toward disease-relevant regions, particularly in complex multiple-disease cases. Overall, these findings show that adaptive preprocessing, rather than universal background removal, offers practical benefits for precision agriculture.

16:30
Attention‑Guided Band Pruning for Efficient Hyperspectral Early Grape Leaf Disease Detection

ABSTRACT. Early detection of grapevine diseases, particularly grapevine leafroll-associated virus (GLRaV) and grapevine red blotch virus (GRBV), is impeded by presymptomatic presentation and on-device computing limitations. This work introduces attention-guided band pruning (AGBP), an embedded selector that learns per-band importance within a 3D ResNet-18 hyperspectral pipeline under an entropy penalty, then retrains compact k-band models whose cost scales with k. Two aggregation rules, EMA-stream and trimmed mean, convert per-sample weights into stable global rankings. Under a unified protocol with fixed plant-level splits and standardized preprocessing on a 40-band grapevine dataset, AGBP improves baseline accuracy and yields compact models that retain about 99% of baseline AUROC at k=4–6 while cutting FLOPs and latency by up to an order of magnitude. Compared with Pearson, ReliefF, and CARS, AGBP performs best at very small spectral budgets and provides predictable accuracy–efficiency curves suitable for edge deployment.

16:45
Automated Activity Monitoring of Cryptic Species in a Zoo Environment

ABSTRACT. Automated monitoring of cryptic species in a zoo environment provides a valuable opportunity to study behaviours and ecological patterns to inform conservation efforts that would otherwise remain poorly documented. In this study, we explore the problem of monitoring a population of endangered Whitaker's Skink (\textit{Oligosoma whitakeri}), a cryptic species that hides under leaf litter for most of the day, held in captivity at Te Nukuao Wellington Zoo, New Zealand. We propose a lightweight pipeline based on the Structural Similarity Index (SSIM) between frames sampled from videos and identify time points where the skinks are active in their habitats. We benchmark our pipeline against a deep-learning approach based on the MegaDetector model, explore the influence of both the sampling rate and choice of difference metric on detection performance, and discuss the challenges of implementing our pipeline. The proposed method provides scalable, interpretable, and low-cost monitoring of skink activity and can be integrated with existing zoo camera systems to reduce the daily workload of keepers without requiring intensive computing resources.

17:00
YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications

ABSTRACT. Manual pruning of radiata pine trees poses signifi- cant safety risks due to extreme working heights and challenging terrain. This paper presents a computer vision framework that integrates YOLO object detection with Semi-Global Block Matching (SGBM) stereo vision for autonomous drone-based pruning operations. Our system achieves precise branch detection and depth estimation using only stereo camera input, eliminating the need for expensive LiDAR sensors. Experimental evaluation demonstrates YOLO’s superior performance over Mask R-CNN, achieving 82.0% mAPmask50–95 for branch segmentation. The integrated system accurately localizes branches within a 2-meter operational range with processing times under one second per frame. These results establish the feasibility of cost-effective autonomous pruning systems that enhance worker safety and operational efficiency in commercial forestry.

17:15
(Online talk) Y-LIChess: Live and Interactive Over-the-Board Chess Recognition and Play with YOLO

ABSTRACT. Chess is widely played on computers, yet over-the-board (OTB) chess remains the official and preferred format for many players due to its tactile and immersive nature. Bridging digital and physical play requires accurate recognition of OTB positions. Prior research has explored modular pipelines for board localization, square occupancy, and piece classification, as well as one-shot detectors. While these approaches demonstrate strong accuracy in controlled conditions, they often accumulate errors across stages, face latency and robustness issues, and rarely support interactive play. Thus, in this work, we present Y-LIChess, a YOLO-based system for live, interactive OTB play with engines and online platforms. Y-LIChess employs semi-automatic calibration, event-triggered recognition, and legality-aware validation to ensure seamless, low-latency interaction. On our wood180 dataset, built with an active learning process to reduce manual annotation, Y-LIChess achieves 99.36 AP50 with only 0.21% per-square error, reconstructs 100% of boards within one mistake, and performs FEN reconstruction in ~7 ms GPU, more than an order of magnitude faster than prior pipelines.

17:30
Evaluating Human Perception of Automatically Created Synthetic Road Networks that Integrate Real-world Cost Factors and Terrain Features

ABSTRACT. Creating synthetic road networks that are both realistic and convincing within virtual environments can be time-consuming and difficult. Several researchers have investigated the use of procedural algorithms for automatically creating road networks. While the results are often visually pleasing, they may not be plausible since they do not consider real-world factors such as the construction costs of different designs. We designed, implemented and evaluated an automated system that can generate virtual road networks based upon real-world information, including terrain features, land usage details, and economic cost estimates, to procedurally create realistic road layouts. A user study (n=32) showed that our system produced more realistic roads than previous work and the resulting road network was perceived to be as realistic and cost-efficient as the equivalent real road network of the given land area. Changing parameters can significantly alter outcomes and choosing a small population size resulted in the most plausible results.

17:45
Fake Money, Real Threat: Fooling Wavelet-Based Banknote Authentication with AdvGAN
PRESENTER: Julian Knaup

ABSTRACT. As machine learning models are increasingly deployed, their vulnerability to adversarial examples poses a significant threat to security-relevant applications. Financial transactions, in particular, rely on the assumption that payments are legitimate, which makes banknote authentication an essential use case. Banknotes incorporate several security features, and the applied printing technique itself can be leveraged for authentication. Specifically, Intaglio printing results in fine line work and microstructures that can be analyzed and distinguished by spatial frequency analysis, e.g. the wavelet packet transform. By evaluating statistical moments of wavelet coefficient histograms, fast and reliable authentication is achieved.

This paper adapts the AdvGAN framework for the context of wavelet-based banknote authentication. By proposing a customized loss function that constrains the feature space, highly effective yet subtle adversarial examples are generated. These perturbations deceive the authentication system, causing it to classify forgeries as genuine banknotes and genuine ones as forgeries. Under attack a maximum drop from 100% classification accuracy to 0% is accomplished. Furthermore, limitations and countermeasures are outlined highlighting potential challenges of deploying such attacks in practical scenarios.