IEEE-AIVR 2019: IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY 2019
PROGRAM FOR MONDAY, DECEMBER 9TH
Days:
next day
all days

View: session overviewtalk overview

07:30-08:30Breakfast (included in registration)
08:30-09:40 Session 1: Opening & keynote
08:30
IEEE ISM & AIVR Opening (by general co-chairs)
08:40
The Future of Mixed Reality Interactions

ABSTRACT. The vision of always-on Mixed Reality interfaces that can be used in a continuous fashion for an entire day, depends on solving many difficult problems including display technology, comfort, computing power, batteries, localization, tracking, and spatial understanding. However, solving all those will not bring us to a truly useful experience unless we also solve the fundamental problem of how to effectively interact in Mixed Reality. I believe that the solution to the MR interaction problem requires that we combine the approaches from interaction design, perceptual science, and machine learning, to yield truly novel and effective MR input and interactions. Such interactions will need to be adaptive to the user context, believable, and computational in nature. We are at the exciting point in the technology development curve where there are still few universally accepted standards for MR input, which leaves a ton of opportunities for both researchers and practitioners.

09:40-10:00Coffee break
10:00-11:30 Session 2: Paper presentations (full papers)
10:00
Message by IEEE AIVR co-chairs & program co-chairs
10:15
Exploring Perspective Dependency in a Shared Body with Virtual Supernumerary Robotic Arms

ABSTRACT. With advancements in robotics, systems featuring wearable robotic arms teleoperated by a third party are appearing. An important aspect of these systems is the visual feedback provided to the third party operator. This can be achieved by placing a wearable camera on the robotic arm’s “host,” or Main Body Operator (MBO), but such a setup makes the visual feedback dependant on the movements of the main body. Here we reproduced this view dependency in VR using a shared body. The “host” (the MBO) shares their virtual body with the virtual Supernumerary Robotic Arms (SRAs) of the teleoperator, or Parasite Body Operator (PBO). The two users perform two tasks: (i) a “synchronization task” to improve their joint action performance and (ii) a “building task” where they worked together to build a tower. In a user study, we evaluated the embodiment, workload, and performance of the PBO through the “building task” with three different view dependency modes: (C1) using the coordinate system of the PBO; (C2) using the coordinate system of the MBO; and (C3) using the coordinate system of the MBO and the coordinate system of the PBO.

10:40
Using Eye Tracked Virtual Reality to Classify Understanding of Vocabulary in Recall Tasks

ABSTRACT. In recent years, augmented and virtual reality (AR/VR) have started to take a foothold in markets such as training and education. Although AR and VR have tremendous potential, current interfaces and applications are still limited in their ability to recognize context, user understanding, and intention, which can limit the options for customized individual user support and the ease of automation.

This paper addresses the problem of automatically recognizing whether or not a user has an understanding of a certain term, which is directly applicable to AR/VR interfaces for language and concept learning. To do so, we first designed an interactive word recall task in VR that required non-native English speakers to assess their knowledge of English words, many of which were difficult or uncommon. Using an eye tracker integrated into the VR Display, we collected a variety of eye movement metrics that might correspond to the user's knowledge or memory of a particular word. Through experimentation, we show that both eye movement and pupil radius have a high correlation to user memory, and that several other metrics can also be used to help classify the state of word understanding. This allowed us to build a support vector machine (SVM) that can predict a user's knowledge with an accuracy of 62\% in the general case and and 75\% for easy versus medium words, which was tested using cross-fold validation. We discuss these results in the context of in-situ learning applications.

11:05
The Role of Virtual Reality in Autonomous Vehicles Safety

ABSTRACT. Virtual Reality (VR) have been playing an important role in the development of autonomous robots. From Computer Aided Design (CAD) to simulators for testing the automation algorithms without risking expensive equipment, VE has been used in a wide range of applications. Most recently, Autonomous Vehicles (AV), a special application of autonomous robots, became an import bet of the scientific and practitioner community for road and vehicle safety improvements. However, recent AVs accidents shed a light on the new safety challenges that need be addressed to fulfill those safety expectations. This paper presents a systematic literature mapping on the use of VR for AV safety and discuss around the vision the VR will play an important role in the development of safety in AV.

11:30-12:30 Session 3: Keynote
11:30
Enhancing Immersive Experiences with (contextual) user data

ABSTRACT. It took about 20 years for video over the Internet to be delightful for end users. This was built on a large body of research from both academia and the industry.  In the meantime, over the last few years, AI has provided generational transformations in video (content) understanding algorithms and in our ability to learn and adapt from big user (behavioral) data. How do we stand on the shoulders of these giants (transformations) to make next-gen immersive experiences compelling? I will take the audience through a journey on how to leverage these technological transformations to enhance end-user immersive experiences, from simple 360-degree VR videos or AR scenes to more complex volumetric videos or geo-located large-scale AR applications. Some demos of past and current projects will be shown with a call to leverage contextual user data to improve and personalize end-user immersive experiences.

12:30-14:00Lunch (on your own)
14:00-15:45 Session 4A: Paper presentations (short & industry papers)
14:00
Room Style Estimation for Style-Aware Recommendation

ABSTRACT. Interior design is a complex task as evident by multitude of professionals, websites, and books, offering design advice. Additionally, such advice is highly subjective in nature, since different experts might have different interior design opinions. Our goal is to offer data-driven recommendations for an interior design task, that reflects an individual's room style preferences. We present a style-based, image suggestion framework, to search for room ideas and relevant products, for a given query image. We train a deep neural network classifier by focusing on high volume classes with high-agreement samples, using a VGG architecture. The resulting model shows promising results, and paves the way to style-aware product recommendation with a holistic understanding of the room style.

14:15
Augmented Reality for Human-Robot Cooperation in Aircraft Assembly

ABSTRACT. Augmented Reality (AR) is often discussed as one of the enabling technologies in Industrie 4.0. In this paper, we describe a practical application, where Augmented Reality glasses are used not only for assembly assistance, but also as a means of communication to enable the orchestration of a hybrid team consisting of a human worker and two mobile robotic systems. The task of the hybrid team is to rivet so-called stringers onto an aircraft hull. While the two robots do the physically demanding, unergonomic and possibly hazardous tasks (squeezing and sealing rivets), the human takes over those responsibilities that need experience, multi-sensory sensitiveness and specialist knowledge. We describe the working scenario, the overall architecture and give design and implementation details on the AR application.

14:30
Structuring and inspecting 3D anchors for seismic volume into Hyperknowledge Base in virtual reality

ABSTRACT. Seismic data is a source of information which geoscientists use to investigate underneath regions and look for possible resources to explore. Such data are volumetric and noisy, thus a challenge to visualize. Over the years, these data motivated the research of new computational systems to assist the expert in that endeavor, such as visualization methods, signal processing, and machine learning models, to name a few. We propose a system that aids geologists, geophysicists, and related experts of the domain to interpret seismic data in virtual reality (VR). The system uses a hyperknowledge base (HKBase), which structures ROI's as anchors with semantics from the user to the system and vice-versa. For instance, through the HKBase, the user can load and inspect the output from AI systems or give new inputs and feedback in the same way. We ran tests with experts to evaluate the system in their tasks to collect feedback and new insights on how the software could contribute to their processes. According to our results, we claim that we took one step forward in VR for the oil & gas industry by creating a valuable experience for the task of seismic interpretation.

14:45
Deep Learning on VR-Induced Attention
PRESENTER: Gang Li

ABSTRACT. Some evidence suggests that virtual reality (VR) approaches may lead to a greater attentional focus than experiencing the same scenarios presented on computer monitors. The aim of this study is to differentiate attention levels captured during a perceptual discrimination task presented on two different viewing platforms, standard personal computer (PC) monitor and head-mounted-display (HMD)-VR, using a welldescribed electroencephalography (EEG)-based measure (parietal P3b latency) and deep learning-based measure (that is EEG features extracted by a compact convolutional neural network—EEGNet and visualized by a gradient-based relevance attribution method—DeepLIFT). Twenty healthy young adults participated in this perceptual discrimination task in which according to a spatial cue they were required to discriminate either a “Target” or “Distractor” stimuli on the screen of viewing platforms. Experimental results show that the EEGNet-based classification accuracies are highly correlated with the p values of statistical analysis of P3b. Also, the visualized EEG features are neurophysiologically interpretable. This study provides the first visualized deep learning-based EEG features captured during a HMD-VR-based attentional task.

15:00
Situation-adaptive object grasping recognition in VR environment

ABSTRACT. In this paper, we propose a method for recognizing grasping of virtual objects in VR environment. The proposed method utilizes the fact that the position and shape of the virtual object to be grasped are known. A camera acquires an image of the user grasping a virtual object, and the posture of the hand is extracted from that image. The obtained hand posture is used to classify whether it is a grasping action or not. In order to evaluate the proposed method, we created a new dataset that was specialized for grasping virtual objects with a bare hand. There were three shapes and three positions of virtual objects in the dataset. The recognition rate of the classifier that was trained using the dataset with specific shapes of virtual objects was 93.18 %, and that with all the shapes of virtual objects was 87.71 %. This result shows that the recognition rate was improved by training the classifier using the shape-dependent dataset.

15:15
ATVR: An Attention Training System using Multitasking and Neurofeedback on Virtual Reality Platform
PRESENTER: Menghe Zhang

ABSTRACT. We present an attention training system based on the principles of multitasking training scenario and neurofeedback, which can be targeted on PCs and VR platforms. Our training system is a video game following the principle of multitasking training, which is designed for all ages. It adopts a non-invasive Electroencephalography (EEG) device Emotiv EPOC+ to collect EEG. Then wavelet package transformation(WPT) is applied to extract specific components of EEG signals. We then build a multi-class supporting vector machine(SVM) to classify different attention levels. The training system is built with the Unity game engine, which can be targeted on both desktops and Oculus VR headsets. We also launched an experiment by applying the system to preliminarily evaluate the effectiveness of our system. The results show that our system can generally improve users' abilities of multitasking and attention level.

15:30
Remote Environment Exploration with Drone Agent and Haptic Force Feedback

ABSTRACT. Camera drones allow exploration of remote scenes that are inaccessible or inappropriate to visit in person. However, these exploration experiences are often limited due to insufficient scene information provided by front cameras, where only 2D images or videos are supplied. Combining a camera drone vision with haptic feedback would augment users’ spatial understanding of the remote environment. But such combinations are usually difficult for users to learn and apply, due to the complexity of the system and unfluent UAV control. Here, we present a new telepresence system for remote environment exploration, with a drone agent controlled by a VR mid-air panel. The drone is capable of generating real-time location and landmark details using integrated Simultaneous Location and Mapping (SLAM). The SLAMs’ point cloud generations are produced using RGB input, and the results are passed to a Generative Adversarial Network (GAN) to reconstruct the remote scene in real-time. The reconstructed objects are taken advantage of by haptic devices which could provide sophisticated haptic rendering to users. Capable of providing both visual and haptic feedback, our system allows users to examine and exploit remote areas without having to be physically present. We have conducted an experiment that confirms the usability of 3D reconstruction result in haptic feedback rendering.

14:00-15:45 Session 4B: Workshop CRHD (part 1)
14:00
Introduction
PRESENTER: Fabien Danieau
14:10
Digital humans: models of behavior and interactivity (keynote)

ABSTRACT. As techniques for capturing and generating realistic digital humans become more widely available, the need for realistic movement and behavior becomes more important. The Uncanny Valley effect is more pronounced for moving, as opposed to still, imagery, necessitating higher fidelity motion replication, such as from motion capture, as well as higher fidelity behavior models for synthetic movement. This talk explores my work in modeling both appearance and behavior of digital humans, including capture, rigging, and interactivity.

14:55
Temporal Interpolation of Dynamic Digital Humans using Convolutional Neural Networks
PRESENTER: Irene Viola

ABSTRACT. In recent years, there has been an increased interest in point cloud representation for visualizing digital humans in cross reality. However, due to their voluminous size, point clouds require high bandwidth to be transmitted. In this paper, we propose a temporal interpolation architecture capable of increasing the temporal resolution of dynamic digital humans, represented using point clouds. With this technique, bandwidth savings can be achieved by transmitting dynamic point clouds in a lower temporal resolution, and recreating a higher temporal resolution on the receiving side. Our interpolation architecture works by first downsampling the point clouds to a lower spatial resolution, then estimating scene flow using a newly designed neural network architecture, and finally upsampling the result back to the original spatial resolution. To improve the smoothness of the results, we additionally apply a novel technique called neighbour snapping. To be able to train and test our newly designed network, we created a synthetic point cloud data set of animated human bodies.

Results from the evaluation of our architecture through a small-scale user study show the benefits of our method with respect to the state of the art in scene flow estimation for point clouds. Moreover, correlation between our user study and existing objective quality metrics confirm the need for new metrics to accurately predict the visual quality of point cloud contents.

 

15:20
Automatic Generation of 3D Facial Rigs

ABSTRACT. Digital humans are key aspects of the rapidly evolving areas of virtual reality, augmented reality, virtual production and gaming. Even outside of the entertainment world, they are becoming more and more commonplace in retail, sports, social media, education, health and many other fields. This talk presents a fully automatic pipeline for generating and high geometric and textural quality facial rigs. They are automatically rigged with facial blendshapes for animation. The steps of this pipeline such as photogrammetry, landmarking, retopology, and blenshapes transfer are detailed. Then two applications for creating fast VR avatars, and for generating quality digital doubles are showcased.

15:45-16:15Coffee break
16:15-18:00 Session 5A: Workshop AIXR
16:15
XR for Augmented Utilitarianism

ABSTRACT. Steady progresses in the AI field create enriching possibilities for society while simultaneously posing new complex challenges of ethical, legal and safety-relevant nature. In order to achieve an efficient human-centered governance of artificial intelligent systems, it has been proposed to harness augmented utilitarianism (AU), a novel non-normative ethical framework grounded in science which can be assisted e.g. by Extended Reality (XR) technologies. While AU provides a scaffold to encode human ethical and legal conceptions in a machine-readable form, the filling in of these conceptions requires a transdisciplinary amalgamation of scientific insights and preconditions from manifold research areas. In this short paper, we present a compact review on how XR technologies could leverage the underlying transdisciplinary AI governance approach utilizing the AU framework. Towards that end, we outline pertinent needs for XR in two hereto related contexts: as experiential testbed for AU-relevant moral psychology studies and as proactive AI Safety measure and enhancing policy-by-simulation method preceding the deployment of AU-based ethical goal functions.

16:35
Extending socio-technological reality for ethics in artificial intelligent systems

ABSTRACT. Due to significant technological advances leading to an expansion of the possible solution space of more or less autonomously operating artificial intelligent systems in real-world environments, society faces the challenge to specify the goals of these systems while jointly covering ethical conceptions and legal frameworks. In this paper, we postulate that for this complex task of societal relevance pertaining to both AI Ethics and AI Safety, Virtual Reality (VR) and also Augmented Reality (AR) represent valuable tools whose utilization facilitates the extension of socio-technological reality by offering a rich counterfactual experiential testbed for enhanced ethical decision-making. For this purpose, we use the example of autonomous vehicles (AVs) to elaborate on how VR and AR could provide a twofold structured augmentation for the governance of artificial intelligent systems by enhancing society with regard to ethical self-assessment and ethical debiasing. Thereby, we extend existing literature by tailored recommendations based on insights from cognitive neuroscience and psychology to solve inconclusive open issues related to past VR experiments involving ethically relevant dilemmas in AV contexts. Finally, we comment on possible VR/AR-based cognitive-affective augmentation measures for a transformative impact on future AI Ethics and AI Safety endeavors.

16:55
Q&A with presenters
17:05
Interactive session
16:15-18:00 Session 5B: Workshop CRHD (part 2)
16:15
The Design Process for Enhancing Visual Expressive Qualities of Characters from Performance Capture into Virtual Reality

ABSTRACT. In designing performances for virtual reality one must consider the unique qualities of the VR medium in order to deliver expressive character performance. This means that the design requirements for participant engagement and immersion must evolve to address these new possibilities. Embedding the importance of emotion and expression into the process of making character movement, specifically through strong acting and directing, showcases the need for more attention to expressive human movement to enhance immersive experiences.

16:30
Influence of Motion Speed on the Perception of Latency in Avatar Control

ABSTRACT. With the dissemination of Head Mounted Display devices in which users cannot see their body, simulating plausible avatars has become a key challenge. For fullbody interaction, avatar simulation and control involves several steps, such as capturing and processing the motion (or intentions) of the user using input interfaces, providing the resulting user state information to the simulation platform, computing a plausible adaptation of the virtual world, rendering the scene, and displaying the multisensory feedback to the user through output interfaces. All these steps imply that the displayed avatar motion appears to users with a delay (or latency) compared to their actual performance. Previous works have shown an impact of this delay on the perception-action loop, with possible impact on Presence and embodiment. In this paper we explore how the speed of the motion performed when controlling a fullbody avatar can impact the way people perceive and react to such a delay. We conducted an experiment where users were asked to follow a moving object with their finger, while embodied in a realistic avatar. We artificially increased the latency by introducing different levels of delays (up to 300ms) and measured their performance in the mentioned task, as well as their feeling about the perceived latency. Our results show that motion speed influenced the perception of latency: we found critical latencies of 80ms for medium and fast motion speeds, while the critical latency reached 120ms for a slow motion speed. We also noticed that performance is affected by both latency and motion speed, with higher speeds leading to decreased performance. Interestingly, we also found that performance was affected by latency before the critical latency for medium and fast speeds, but not for a slower speed. These findings could help to design immersive environments to minimize the effect of latency on the performance of the user, with potential impacts on Presence and embodiment.

16:45
Multispectral Illumination in USC ICT's Light Stage X

ABSTRACT. USC ICT's computational illumination system Light Stage X has been used for a variety of different techniques: from studio lighting reproduction to high resolution facial scanning. In this talk, I'll describe how adding multispectral LEDs to the system has improved color rendition for a variety of such Light Stage techniques, while also enabling higher resolution facial capture. I will conclude with opportunities for future work on human digitization leveraging multispectral illumination sources.

17:15
Automating mass production of digital avatars for VR

ABSTRACT. This talk covers how the Vision and Graphics Lab at USC’s ICT is leveraging the latest Light Stage technology to devise a database of facial scans. Recent movements toward a convergence of visual quality in real time and offline rendering, in conjunction with the massive rise of deep learning approaches for processing and recreation of human data, have drastically simplified the ability to generate realistic avatars for VR; something that previously was reserved to high end visual effects studios requiring a multitude of highly specialized artists and engineers. We have developed a pipeline for scanning, preprocessing, and registration of expressive facial scans to automate the building of a database that enables training of machine learning algorithms to generate highly detailed and visually realistic avatars. This presentation will focus on the main obstacles confronted when building such a database and pipeline, aimed specifically for facial scan data but stretching further by combining multiple data sources and providing automatic rigging, animation, and rendering of a massive number of digital avatars.

17:45
Wrap up
PRESENTER: Fabien Danieau