Days: Monday, June 2nd Tuesday, June 3rd Wednesday, June 4th
View this program: with abstractssession overviewtalk overview
How to train large scale 3D human and object foundation models by Prof. Gerard Pons-Moll (University of Tübingen, Tübingen AI Center MPII)
Understanding 3D humans interacting with the world has been a long standing goal in AI and computer vision for decades. Lack of 3D data has been the major barrier of progress. This is changing with the increasing number of 3D datasets featuring images, videos and multi-view with 3D annotations, as well as large-scale image foundation models. However, learning models from such sources is non-trivial. Some of the challenges are: 1) Datasets are annotated with different 3D skeleton formats and outputs, 2) image foundation models are 2D and extracting 3D information from them is hard. I will present solutions to each of these 2 challenges. I will introduce a universal training procedure to consume any skeleton format, a diffusion based method tailored to lift foundation models to 3D (human and also general objects), and a mechanism to probe 3D foundation model features in geometry and texture awareness based on 3D Gaussian splatting reconstruction. I will also show a method to systematically create 3D human benchmarks on demand for evaluation (STAGE).
CAVW
15:05 | An Adaptive Boundary Material Point Method with Surface Particle Reconstruction (abstract) PRESENTER: Haokai Zeng |
15:30 | Going further with Vertex Block Descent (abstract) PRESENTER: Bastien Saillant |
15:55 | Simulation of Ball Levitation with SPH PRESENTER: Sun-Lay Gagneux |
CAVW
15:05 | A Retrieval-Augmented Generation System for Accurate and Contextual Historical Analysis : AI-Agent for the Annals of the Joseon Dynasty (abstract) PRESENTER: Jeongha Lee |
15:30 | AIKII: An AI-enhanced Knowledge Interactive Interface for Knowledge Representation in Educational Games (abstract) PRESENTER: Dake Liu |
15:55 | Toward Fluoroscopy Guided Robotic Needle Insertion for Radio Frequency Ablation (abstract) PRESENTER: Thuc Long Ha |
CAVW
16:35 | Fuzzy Sampling with Qualified Uniformity Properties for Implicitly Defined Curves and Surfaces (abstract) PRESENTER: Mingxiao Hu |
17:00 | A robust 3D mesh segmentation algorithm with anisotropic sparse embedding (abstract) PRESENTER: Mengyao Zhang |
17:25 | ReDACT: Reconstructing Detailed Avatar with Controllable Texture PRESENTER: Zezheng Chen |
17:50 | A Real-time Virtual-Real Fusion Rendering Framework in Cloud-Edge Environments (abstract) PRESENTER: Yuxi Zhou |
CAVW
16:35 | Talk with Socrates: Relation Between Perceived Agent Personality and User Personality in LLM-based Natural Language Dialogue Using Virtual Reality (abstract) PRESENTER: Mehmet Efe Sak |
16:58 | Path Modeling of Visual Attention, User Perceptions, and Behavior Change Intentions in Conversations with Embodied Agents in VR (abstract) PRESENTER: Sagar Ashok Vankit |
17:21 | MemorIA, an Architecture for Creating Interactive AI Historical Agents in Educational Contexts (abstract) PRESENTER: Antoine Oger |
17:44 | Exploring the Impact of Multimodal Long Conversations in VR on Attitudes Towards Behavior Change, Memory Retention, and Cognitive Load (abstract) PRESENTER: Sagar Vankit |
18:07 | User Interface for Controlling Crowd in Metaverse Using Spatial Controller (abstract) PRESENTER: Masaki Oshita |
View this program: with abstractssession overviewtalk overview
Harmonized XR: Seamlessly Bridging Physical and Perceptual Realism by Prof. Xubo yang (Shanghai Jiao Tong University)
Extended Reality (XR) represents a spectrum of immersive technologies that seamlessly blend the digital and physical worlds, creating environments where users can interact with virtual content as if it were part of their reality This keynote synthesizes cutting-edge research across visual perception, physical simulation, and interactive rendering to explore how XR can achieve both physical realism (accurate representation of physical phenomena) and perceptual realism (alignment with human visual and sensory perception).
We begin by addressing the challenges of visual fidelity in XR through innovative techniques that enhance occlusion, color accuracy, and rendering efficiency, ensuring that virtual content aligns seamlessly with human perception. Next, we delve into advancements in simulation methodologies that bring unprecedented physical accuracy to virtual environments, enabling the realistic representation of complex phenomena such as fluids, bubbles, and surface tension effects. Finally, we explore interactive experiences that bridge the gap between physical and perceptual realism by optimizing virtual interactions to align with natural human behavior and visual focus.
By integrating these advancements, XR can achieve a harmonious balance between physical and perceptual realism, creating immersive environments that are not only computationally efficient but also deeply engaging and believable. This keynote will highlight the interplay between these dimensions, offering a comprehensive roadmap for the future of XR technologies.
3 CAVW 1 LNCS
09:35 | Talking Face Generation with Lip and Identity Priors (abstract) PRESENTER: Jiajie Wu |
09:53 | Speech-Driven 3D Facial Animation with Regional Attention for Style Capture PRESENTER: Jiahao Pan |
10:11 | Coarse-to-Fine 3D Craniofacial Landmark Detection via Heat Kernel Optimization (abstract) PRESENTER: Xingfei Xue |
10:29 | GSFaceMorpher: High-Fidelity 3D Face Morphing via Gaussian Splatting (abstract) PRESENTER: Xiwen Shi |
CAVW
09:35 | Chinese Painting Generation with A Stroke-by-Stroke Renderer and a Semantic Loss (abstract) PRESENTER: Yuan Ma |
10:00 | Research on Multi-Feature Fusion Shadow Puppet Motifs Generation Based on CSPMotifsGAN and Cultural Heritage Preservation (abstract) PRESENTER: Rui Wang |
10:25 | CLPFusion: A Latent Diffusion Model Framework for Realistic Chinese Landscape Painting Style Transfer (abstract) PRESENTER: Jiahui Pan |
CAVW
11:05 | Decoupling Density Dynamics: A Neural Operator Framework for Adaptive Multi-Fluid Interactions (abstract) PRESENTER: Yuhang Xu |
11:30 | A Control Simulation of Multiple Bubbles for Representing Desired Shapes (abstract) PRESENTER: Syuhei Sato |
11:55 | A versatile energy-based SPH surface tension with spatial gradients (abstract) PRESENTER: Qianwei Wang |
11:05 | Virtual Guides and Crowd Behaviors: Understanding Evacuation Decision-Making in Virtual Reality PRESENTER: Ziyuan Feng |
11:23 | BACH: Bi-stage Data-driven Piano Performance Animation for Controllable Hand motion (abstract) PRESENTER: Jihui Jiao |
11:41 | Risk-Aware Pedestrian Behavior Using Reinforcement Learning in Mixed Traffic (abstract) PRESENTER: Tzu-Yu Chen |
11:59 | Improving Fidelity of Close Social Interaction Animations in Social VR with a Machine Learning-based Refinement Framework PRESENTER: Roberta Macaluso |
4 CAVW 1 LNCS
14:00 | Scene-EEGCNN: Visualization of Zen Meditation Experience Based on EEG-Cultural Heritage Integration (abstract) PRESENTER: Longfei Yang |
14:20 | Exploring the Therapeutic Potential of VR-Based ASMR Animation: A Comparative Study on Relaxation and Sleep Aid (abstract) PRESENTER: Jiahao Du |
14:40 | Immersion Discrepancies in Educational Serious Games Among Children's Age Groups (abstract) PRESENTER: Yukun Li |
15:00 | Immersive Video Game Experience through Naturalistic and Emotive Dialogue Agent (abstract) PRESENTER: Michael Adjeisah |
15:15 | Photorealistic 3D Head Reconstruction via 2D Gaussians (abstract) PRESENTER: Anil Bas |
16:00 | Peridynamics-Based Simulation of Viscoelastic Solids and Granular Materials (abstract) PRESENTER: Jiamin Wang |
16:20 | Automating Visual Narratives: Learning Cinematic Camera Perspectives from 3D Human Interaction (abstract) PRESENTER: Boyuan Cheng |
16:40 | Intelligent Compilation System for Chinese Character Animation Based on Dynamic Data Sets (abstract) PRESENTER: Xin Luo |
17:00 | Unsupervised Salient Object Detection with Pseudo-Labels Refinement (abstract) PRESENTER: Hao Liu |
17:20 | Using Large Language Models for Evaluation of Radiological Textual Reports (abstract) PRESENTER: Nicolay Rusnachenko |
17:32 | AssetMask: Mask R-CNN-based approach for Asset detection in railroad track health monitoring (abstract) PRESENTER: Aradhya Saini |
17:44 | LLM-Powered VR Nursing Training for Dynamic Risk Assessment (abstract) PRESENTER: Ehtzaz Chaudhry |
View this program: with abstractssession overviewtalk overview
Generative GaitNet and Beyond: Foundational Models for Human Motion Analysis and Simulation by Prof. Jehee Lee (Seoul National University)
Understanding the relationship between human anatomy and motion is fundamental to effective gait analysis, realistic motion simulation, and the creation of human body digital twins. We will begin with Generative GaitNet (SIGGRAPH 2022), a foundational model for human gait that drives a comprehensive full-body musculoskeletal system comprising 304 Hill-type musculotendons. Generative GaitNet is a pre-trained, integrated system of artificial neural networks that operates in a 618-dimensional continuous space defined by anatomical factors (e.g., mass distribution, body proportions, bone deformities, and muscle deficits) and gait parameters (e.g., stride and cadence). Given specific anatomy and gait conditions, the model generates corresponding gait cycles via real-time physics-based simulation. Next, we will discuss Bidirectional GaitNet (SIGGRAPH 2023), which consists of forward and backward models. The forward model predicts the gait pattern of an individual based on their physical characteristics, while the backward model infers physical conditions from observed gait patterns. Finally, we will present MAGNET (Muscle Activation Generation Networks)—another foundational model (SIGGRAPH 2025)—designed to reconstruct full-body muscle activations across a wide range of human motions. We will demonstrate its ability to accurately predict muscle activations from motions captured in video footage. We will conclude by discussing how these foundational models collectively contribute to the development of human body digital twins, and explore their future potential in personalized rehabilitation, surgery planning, and human-centered simulation.
09:35 | Perspective Matters: Investigating the Effects of Vibrotactile Mode Design on User Experience in Action-Role Playing Game and Media PRESENTER: Hongyu Liu |
09:53 | Exploring Cultural Heritage with AR: The TAM Case Study of Nvshu PRESENTER: Yejuan Xie |
10:11 | A Design Study on Contextual and Interactive Serious Games for Children’s Learning of Chinese Character Culture PRESENTER: Xu Lang |
10:29 | Summon Arcane: An AI-Driven Pixel Art Game with Interactive Narrative and Immersive Summoning Experience PRESENTER: Haoxiang Yang |
09:35 | YOLOv8-HAC: Safety helmet detection model for complex underground coal mine scene (abstract) PRESENTER: Rui Liu |
10:00 | STA-TAD: Spatial-Temporal Adapter on ViT for Temporal Action Detection PRESENTER: Tingwei Wu |
10:25 | AU-guided Feature Aggregation for Micro-Expression Recognition (abstract) PRESENTER: Weiqi Xu |
4 LNCS
11:05 | Potential Representation Learning for Visible-Infrared Person Re-Identification in Virtual Surveillance Systems PRESENTER: Haoyuan Du |
11:30 | Hybrid-Granularity Image-Music Retrieval Using Contrastive Learning between Images and Music PRESENTER: Xudong He |
11:55 | Text-driven Tree Modeling via CLIP-based Optimization PRESENTER: Yudai Ichimura |
2 CAVW 2 LNCS
11:05 | UTMCR:3U-Net Transformer with Multi-Contrastive Regularization for Single Image Dehazing (abstract) PRESENTER: Hangbin Xu |
11:23 | SCNet: A Dual-Branch Network for Strong Noisy Image Denoising Based on Swin Transformer and ConvNeXt (abstract) PRESENTER: Chuchao Lin |
11:41 | ShadowCraft-NeRF: Occlusion and Shadow Mitigation via SAM-Guided NeRF PRESENTER: Xun Chen |
11:59 | Visualizing the Invisible: An Efficient Framework for Microscopic Visualization (abstract) PRESENTER: Haoran Jia |
CAVW
14:00 | RIDGE: Rule-Infused Deep Learning for Realistic Co-Speech Gesture Generation (abstract) PRESENTER: Ghazanfar Ali |
14:20 | Precise Motion Inbetweening via Bidirectional Autoregressive Diffusion Models (abstract) PRESENTER: Jiawen Peng |
14:40 | Motion In-betweening via Recursive Keyframe Prediction (abstract) PRESENTER: Rui Zeng |
15:00 | Interaction with Virtual Objects using Human Pose and Shape Estimation (abstract) PRESENTER: Hong Son Nguyen |
15:20 | Motion Style Transfer: Methods, Challenges, and Future Directions PRESENTER: Siyao Du |
CAVW
14:00 | LGNet:Local-and-Global Feature Adaptive Network for Single Image Two-Hand Reconstruction (abstract) PRESENTER: Haowei Xue |
14:25 | Joint-learning: A Robust Segmentation Method for 3D Point Clouds under Label Noise (abstract) PRESENTER: Tingyun Miao |
14:50 | Weisfeiler-Lehman kernel augmented product representation for queries on large-scale BIM scenes (abstract) PRESENTER: Xiaojun Liu |
15:15 | DTGS: Defocus-Tolerant View Synthesis using Gaussian Splatting (abstract) PRESENTER: Xinying Dai |