CASA 2025: COMPUTER ANIMATION AND SOCIAL AGENTS 2025
PROGRAM FOR MONDAY, JUNE 2ND
Days:
next day
all days

View: session overviewtalk overview

14:00-15:00 Session 2: How to train large scale 3D human and object foundation models

How to train large scale 3D human and object foundation models by Prof. Gerard Pons-Moll (University of Tübingen, Tübingen AI Center MPII)

Understanding 3D humans interacting with the world has been a long standing goal in AI and computer vision for decades. Lack of 3D data has been the major barrier of progress. This is changing with the increasing number of 3D datasets featuring images, videos and multi-view with 3D annotations, as well as large-scale image foundation models. However, learning models from such sources is non-trivial. Some of the challenges are: 1) Datasets are annotated with different 3D skeleton formats and outputs, 2) image foundation models are 2D and extracting 3D information from them is hard. I will present solutions to each of these 2 challenges. I will introduce a universal training procedure to consume any skeleton format, a diffusion based method tailored to lift foundation models to 3D (human and also general objects), and a mechanism to probe 3D foundation model features in geometry and texture awareness based on 3D Gaussian splatting reconstruction. I will also show a method to systematically create 3D human benchmarks on demand for evaluation (STAGE).

Location: Amphie 23
15:05-16:20 Session 3A: Fluid & Physical Simulation I

CAVW

Location: Amphie 23
15:05
An Adaptive Boundary Material Point Method with Surface Particle Reconstruction
PRESENTER: Haokai Zeng

ABSTRACT. The expression of fine details such as fluid flowing through narrow pipes or split by thin plates poses a significant challenge in simulations involving complex boundary conditions. As a hybrid method, the Material Point Method (MPM), which is widely used for simulating various materials, combines the advantages of Lagrangian particles and Eulerian grids. To achieve accurate simulations of fluid flow through narrow pipes, high-resolution uniform grid cells are necessary, but this often leads to inefficient simulation performance. In this paper, we present an adaptive boundary material point method that facilitates adaptive subdivision within regions of interest and conducts collision detection across grids of varying sizes. Within this framework, particles interact through grids of differing resolutions. To tackle the challenge of unevenly distributed subdivided particles, we propose a surface reconstruction approach grounded in the color distance field, which accurately defines the relationship between the particles and the reconstructed surface. Furthermore, we incorporate a mesh refinement technique to enrich the detail of the mesh utilized to mark the grids during subdivision. We demonstrate the effectiveness of our approach in simulating various materials and boundary conditions, and contrast it with existing methods, underscoring its distinctive advantages.

15:30
Going further with Vertex Block Descent
PRESENTER: Bastien Saillant

ABSTRACT. Vertex Block Descent (VBD) is a fast and robust method for real-time simulation of deformable objects using the finite element method. The method is designed to handle only the popular linear tetrahedra. However, the use of these elements makes the simulation less accurate and suffers from locking artifacts. In this context, we propose an extension of this method in order to: (i) use this method with any type of element; (ii) ensure its performance even for high-order elements. In addition, we explain how to improve VBD convergence with the use of sub-steps at the expense of iterations. Overall, by using other types of element, we get a more accurate result for a similar cost as compared to linear tetrahedrons.

15:55
Simulation of Ball Levitation with SPH
PRESENTER: Sun-Lay Gagneux
15:05-16:20 Session 3B: AI in Education & Interfaces

CAVW

Location: Amphie 24
15:05
A Retrieval-Augmented Generation System for Accurate and Contextual Historical Analysis : AI-Agent for the Annals of the Joseon Dynasty
PRESENTER: Jeongha Lee

ABSTRACT. In this paper, we propose an AI-agent that integrates a large language model(LLM) with a Retrieval-Augmented Generation(RAG) system to deliver reliable historical information from the Annals of the Joseon Dynasty through both objective facts and contextual analysis, achieving significant performance improvements over existing models. In order for an ai-agent using the Annals of the Joseon Dynasty to deliver reliable historical information, clear source citations and systematic analysis are essential. The Annals, an official record spanning 472 years (1392–1897), offer a dense, chronological account of daily events and state administration that shaped Korea's cultural, political, and social foundations. We propose integrating a LLM with a RAG system to generate highly accurate responses based on this extensive dataset. This approach provides both objective information about historical figures and events from specific periods and subjective contextual analysis of the era, helping users gain a broader understanding. Our experiments demonstrate improvements of approximately 23 to 50 points on a 100-point scale compared to the GPT-4o and OpenAI AI-Assistant v2 models.

15:30
AIKII: An AI-enhanced Knowledge Interactive Interface for Knowledge Representation in Educational Games
PRESENTER: Dake Liu

ABSTRACT. The use of generative AI to create responsive and adaptive game content has generated considerable interest within the educational game design community, highlighting its potential as a tool for enhancing players' understanding of in-game knowledge. However, effective integration of player-AI dialogues with structured representation of in-game knowledge remains unexplored. This paper presents AIKII, an AI-enhanced Knowledge Interaction Interface designed to facilitate knowledge representation in educational games. AIKII employs multimodal interaction channels to represent in-game knowledge and support player engagement. To investigate its effectiveness and user experience, we implemented AIKII into Poemaster, an educational game centered on learning classic Chinese poetry, and conducted interviews with university students. The results demonstrated that our method brings contextual and reflective connections between players and in-game knowledge representations while improving player autonomy and immersion within the game.

15:55
Toward Fluoroscopy Guided Robotic Needle Insertion for Radio Frequency Ablation
PRESENTER: Thuc Long Ha

ABSTRACT. This paper introduces an innovative fluoroscopy image based registration method and realistic protocol for robotic needle insertion for radiofrequency ablation (RFA) to treat liver cancer. The primary goal is to provide a futuristic model of an operation room equipped with advanced medical devices for needle insertion for RFA, such as a 6-dof robot, C-ARM, teleoperation haptic device, etc. The insertion process automatically uses the inverse Finite Element (FE) simulation to steer the needle. At the same time, organ registration is achieved through simulated fluoroscopy images. However, in conventional RFA, a CT image is used. Due to obstacles in human anatomy, such as the rib cage, the radiologist has to choose the insertion point where the needle's visibility in CT is limited. To address this, we proposed a registration method for automatic needle insertion where several markers are injected, enabling liver registration and precise tumor targeting. Our method provides precise needle insertion to target the tumor in the "no-go zones" and does not depend on the insertion point and the orientation of the needle. The adaptability of our method ensures its applicability in various proficiency of the radiologist and tumors' locations, making it a versatile tool in the field of liver cancer treatment.

16:35-18:20 Session 4A: Geometry, Rendering & Mesh Processing

CAVW

Location: Amphie 23
16:35
Fuzzy Sampling with Qualified Uniformity Properties for Implicitly Defined Curves and Surfaces
PRESENTER: Mingxiao Hu

ABSTRACT. Sampled point clouds, particularly with pre-labeled annotations and ground truth metrics, are frequently used in computer graphics and deep learning. In this work, we focus on a fuzzy sampling approach for such point clouds with qualified uniformity properties. After abstracting the uniformity requirements, a novel approach to sampling point clouds from implicitly defined curves/surfaces is proposed. The approach deliberately combines techniques including space isotropic sampling, curvature compensation, and normalized distance blue noise. The experimental results show many kinds of sampled point clouds with uniform visual effects and statistical metrics. Moreover, the comparisons in terms of distance, density, and thickness uniformity with state-of-the-art methods exhibit the approach's advantages. Due to its low cost, ground truth, and annotation easiness features, the method will be applied in many fields.

17:00
A robust 3D mesh segmentation algorithm with anisotropic sparse embedding
PRESENTER: Mengyao Zhang

ABSTRACT. 3D mesh segmentation, as a very challenging problem in computer graphics, has attracted considerable interests. The most popular methods in recent years are data-driven methods. However, such methods require a large amount of accurately labeled data which is difficult to obtain. In this paper, we propose a novel mesh segmentation algorithm based on anisotropic sparse embedding. We first over-segment the input mesh and get a collection of patches. Then these patches are embedded into a latent space via an anisotropic L1-regularized optimization problem. In the new space, the patches that belong to the same part of the mesh will be closer, while those belonging to different parts will be further. Finally, we can easily generate the segmentation result by clustering. Various experimental results on the PSB and COSEG datasets show that our algorithm is able to get perception-aware results and is superior to the state-of-the-art algorithms. In addition, the proposed algorithm can robustly deal with meshes with different poses, different triangulations, noises, missing regions, or missing parts.

17:25
ReDACT: Reconstructing Detailed Avatar with Controllable Texture
PRESENTER: Zezheng Chen
17:50
A Real-time Virtual-Real Fusion Rendering Framework in Cloud-Edge Environments
PRESENTER: Yuxi Zhou

ABSTRACT. This paper introduces a cloud-edge collaborative framework for real-time virtual-real fusion rendering in augmented reality (AR). By integrating Visual Simultaneous Localization and Mapping (VSLAM) with Neural Radiance Fields (NeRF), the proposed method achieves high-fidelity virtual object placement and shadow synthesis in real-world scenes. The cloud server handles computationally intensive tasks, including offline NeRF-based 3D reconstruction and online illumination estimation, while edge devices perform real-time data acquisition, SLAM-based plane detection, and rendering. To enhance realism, the system employs an improved soft shadow generation technique that dynamically adjusts shadow parameters based on light source information. Experimental results across diverse indoor environments demonstrate the system's effectiveness, with consistent real-time performance, accurate illumination estimation, and high-quality shadow rendering. The proposed method reduces the computational burden on edge devices, enabling immersive AR experiences on resource-constrained hardware, such as mobile and wearable devices.

16:35-18:20 Session 4B: Conversational Agents & Virtual Reality

CAVW

Location: Amphie 24
16:35
Talk with Socrates: Relation Between Perceived Agent Personality and User Personality in LLM-based Natural Language Dialogue Using Virtual Reality
PRESENTER: Mehmet Efe Sak

ABSTRACT. Large Language Models (LLMs) offer almost immediate human-like quality responses to user queries. Conversational agent systems support natural language dialogues utilizing LLM backends in combination with Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) technologies, enabling life-like characters in virtual environments. This study investigates the relationship between user personality and perceived agent personality in LLM-based natural language dialogue. We adopt a Virtual Reality (VR) setting where the user can talk with the agent that assumes the role of Socrates, the famous philosopher. To this end, we utilize a three-dimensional (3D) avatar model resembling Socrates and use specific LLM prompts to get stylistic answers from OpenAI's Chat Completions Application Programming Interface (API). Our user study measures the agent's personality and the system's ease of use, quality, realism, and immersion concerning the user's self-reported personality. The results suggest that the user's conscientiousness, extraversion, and emotional stability have a moderate effect on certain personality factors and system qualities. User conscientiousness affects the perceived ease of use, quality, and realism, while user extraversion affects perceived agent conscientiousness, system realism, and immersion. Additionally, the user's emotional stability correlates with perceived extraversion and agreeableness.

 

16:58
Path Modeling of Visual Attention, User Perceptions, and Behavior Change Intentions in Conversations with Embodied Agents in VR

ABSTRACT. This study examines the impact of subtitles and image visualizations on gaze behavior, working alliance, and behavior change intentions during conversations with embodied conversational agents (ECAs) in virtual reality (VR). Visualizations are defined as graphical elements, such as images and diagrams, displayed on a virtual screen that, along with audio, create a multimodal presentation. Participants were randomly assigned to one of four conditions: no subtitles or visualizations (Control), subtitles only (SUB), image visualizations only (VIS), or both subtitles and image visualizations (VISSUB).

A structural equation path analysis revealed that both subtitles and image visualizations individually reduced gaze toward the ECA. However, when combined (VISSUB), this effect was mitigated, leading to a relative increase in gaze. Gaze behavior was positively associated with working alliance, and user perceptions of enjoyment and appropriateness were linked to engagement, which, in turn, influenced behavior change intentions. In line with cognitive load theory, the VIS condition was associated with lower behavior change intentions, suggesting that excessive visual imagery may introduce cognitive trade-offs that impact user engagement.

17:21
MemorIA, an Architecture for Creating Interactive AI Historical Agents in Educational Contexts
PRESENTER: Antoine Oger

ABSTRACT. This paper presents the architecture of MemorIA, a system that allows the creation of AI-based interactive historical agents, with the aim of fostering students' learning interest. MemorIA generates animated digital portraits of historical figures, synchronizing facial expressions with synthesized speech to enable natural conversations with students. The system leverages NVIDIA Audio2Face for real-time facial animation with First Order Motion Model for portrait manipulation, achieving fluid interaction through low-latency audio-visual streaming. To assess our architecture in a field situation. We conducted a pilot study in middle school history classes, where students and teachers engaged in direct conversation with a virtual Julius Caesar during Roman history lessons. Students asked questions about ancient Rome, receiving contextually appropriate responses. While qualitative feedback suggests a positive trend toward increased student participation, some weaknesses and ethical considerations emerged. Based on this assessment, we discuss implementation challenges, suggest architectural improvements, and explore potential applications across various disciplines.

17:44
Exploring the Impact of Multimodal Long Conversations in VR on Attitudes Towards Behavior Change, Memory Retention, and Cognitive Load
PRESENTER: Sagar Vankit

ABSTRACT. This study examines how multimodal communication strategies (subtitles, visualizations, and their combination), affect memory retention, attitudes towards behavior change, and cognitive load during long conversations (+20 minutes) in immersive virtual reality (VR). Using embodied conversational agents (ECAs) to educate participants on diabetes and healthy eating, we found that all conditions effectively improved memory retention and behavior change attitudes. However, the combination of multimodal strategies increased cognitive load, suggesting a trade-off between engagement and cognitive demands. These findings highlight the potential of long VR conversations for healthcare education, while emphasizing the importance of balancing cognitive demands and exploring personalization for diverse users.

18:07
User Interface for Controlling Crowd in Metaverse Using Spatial Controller
PRESENTER: Masaki Oshita

ABSTRACT. The Metaverse is a virtual space on the internet where users can gather, communicate, and enjoy events such as live performances using their avatars. Although the Metaverse has become more common, events often attract only a small number of participants, creating a deserted impression. To address this issue, we propose populating the Metaverse with crowds of virtual characters and allowing event organizers to control their movements. However, controlling a large number of characters in real-time is challenging, as it requires manipulating the parameters of individual characters. To overcome this, we have developed a novel user interface that uses a spatial controller. In our system, the user wears a head-mounted display to view the scene from above and uses a six degrees-of-freedom spatial controller to manage the movements of the crowd. The key idea behind our user interface is that the spatial controller allows users to simultaneously select target characters and manipulate their parameters. Characters are selected by pointing at them with the controller, while parameters such as the motion magnitude are adjusted based on the controller's height. Experimental results show that our interface is both efficient and intuitive for users.