IEEE-AIVR 2019: IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY 2019
PROGRAM FOR TUESDAY, DECEMBER 10TH
Days:
previous day
next day
all days

View: session overviewtalk overview

07:30-08:30Breakfast (included in registration)
08:30-09:30 Session 6: Keynote
08:30
Experience on Demand

ABSTRACT. Virtual reality is able to effectively blur the line between reality and illusion, granting us access to any experience imaginable. But how does this new medium affect its users, and does it have a future beyond fantasy and escapism? There are dangers and many unknowns in using VR, but it also can help us hone our performance, recover from trauma, improve our learning and communication abilities, and enhance our empathic and imaginative capacities.

09:30-09:50Coffee break
09:50-11:30 Session 7: Paper presentations (best paper candidates)
09:50
Viewport Forecasting in 360-degree Virtual Reality Videos with Machine Learning

ABSTRACT. Objective. Virtual reality (VR) cloud gaming and 360 degree video streaming are on the rise. With a VR headset, the viewer can individually choose the perspective she sees on the head-mounted display by turning her head, which creates the illusion of being in a virtual room. In this experimental study, we applied machine learning methods to anticipate future head rotations (a) from preceding head and eye motions, and (b) from the statistics of other spherical video viewers. Approach. Ten study participants watched each 3 1⁄3 hours of spherical video clips, while head and eye gaze motions were tracked, using a VR headset with a built-in eye tracker. Machine learning models were trained on the recorded head and gaze trajectories to predict (a) changes of head orientation and (b) the field of view from population statistics. Results. We assembled a dataset of head and gaze trajectories of spherical video viewers with great stimulus variability. We extracted statistical features from these time series and showed that a Support Vector Machine can classify the range of future head movements with a time horizon of up to one second with good accuracy. Even population statistics among only ten subjects show prediction success above chance level. Significance. Field of view forecasting opens up various avenues to optimize VR rendering and transmission. While the viewer can see only a section of the surrounding 360 degree sphere, the entire panorama has typically to be rendered and/or broadcast. The reason is rooted in the transmission delay, which has to be taken into account in order to avoid simulator sickness due to motion-to-photon latencies. Knowing in advance, where the viewer is going to look at may help to make cloud rendering and video streaming of VR content more efficient and, ultimately, the VR experience more appealing.

10:15
Non-verbal Behavior Generation for Virtual Characters in Group Conversations

ABSTRACT. We present an approach to synthesize non-verbal behaviors for virtual characters during group conversations. We employ a probabilistic model and use Dynamic Bayesian Networks to find the correlations between the conversational state and non-verbal behaviors. The parameters of the network are learned by annotating and analyzing the CMU Panoptic dataset. The results are evaluated in comparison to the ground truth data and with user experiments. The behaviors can be generated online and has been integrated with the animation engine of a game company specialized in Virtual Reality applications for Cognitive Behavioral Therapy. To our knowledge, this is the first study that takes into account a data-driven approach to automatically generate non-verbal behaviors during group interactions.

10:40
Combining Pairwise Feature Matches from Device Trajectories for Biometric Authentication in Virtual Reality Environments

ABSTRACT. In this paper we provide an approach to perform seamless continual biometric authentication of users in virtual reality (VR) environments by combining position and orientation features from the headset, right hand controller, and left hand controller of a VR system. The rapid growth of VR in mission critical applications in military training, flight simulation, therapy, manufacturing, and education necessitates authentication of users based on their actions within the VR space as opposed to traditional PIN and password based approaches. To mimic goal-oriented interactions as they may occur in VR environments, we capture a VR dataset of trajectories from 33 users throwing a ball at a virtual target with 10 samples per user captured on a training day, and 10 samples on a test day. Due to the sparseness in the number of training samples per user, typical of realistic interactions, we perform authentication by using pairwise relationships between trajectories. Our approach uses a perceptron classifier to learn weights on the matches between position and orientation features on two trajectories from the headset and the hand controllers, such that a low classifier score is obtained for trajectories belonging to the same user, and a high score is obtained otherwise. We also perform extensive evaluation on the choice of position and orientation features, combination of devices, and choice of match metrics and trajectory alignment method on the accuracy, and demonstrate a maximum accuracy of 93.03% for matching 10 test actions per user by using orientation from the right hand controller and headset.

11:05
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video

ABSTRACT. We introduce a convolutional neural network model for unsupervised learning of depth and ego-motion from cylindrical panoramic video. Panoramic depth estimation is an important technology for applications such as virtual reality, 3d modeling, and autonomous robotic navigation. In contrast to previous approaches for applying convolutional neural networks to panoramic imagery, we use the cylindrical panoramic projection which allows for the use of the traditional CNN layers such as convolutional filters and max pooling without modification. Our evaluation on synthetic and real data shows that unsupervised learning of depth and ego-motion on cylindrical panoramic images can produce high-quality depth maps and that an the field-of-view improves ego-motion estimation accuracy. We also introduce Headcam, a novel dataset of panoramic video collected from a helmet-mounted camera while bike-riding in an urban setting.

11:30-12:30 Session 8: Keynote
11:30
Integrating Virtual Reality Capabilities into Mission Critical Systems

ABSTRACT. Large scale tactical and strategic mission systems are composed of a complex range of components and capabilities that are integrated across a variety of platforms and frameworks. Systems of the near and distant future will increasingly depend on autonomous operations, artificial intelligence, machine learning and other disruptive technologies that will impact the timing dynamics and decision-making responses required of critical real-time systems. Artificial intelligence and virtual/augmented reality have the potential for serving as force multipliers and providing new capabilities in system modeling, operations planning, immersive training environments and real-time human-machine teaming for meeting these challenges. Considering the multi-faceted dimensions of Northrop Grumman’s mission objectives, an end-to-end systems-level design approach is required to leverage and integrate capabilities across the enterprise and achieve an integrated system-of-systems, at scale, in tactical and strategic environments. The presentation will demonstrate various examples in which artificial intelligence and virtual reality applies to Northrop Grumman systems, products and services. The discussion will also provide insights into practical considerations for utilizing virtual reality within mission-critical applications.

12:30-14:00Lunch (on your own)
14:00-15:00 Session 9: Keynote
14:00
Using Volumetric Video for Remote Communication and Collaboration: development and evaluation of a social VR system based on point clouds

ABSTRACT. With Social Virtual Reality emerging as a new medium where users can remotely experience immersive content with others, the vision of a true feeling of 'being there together' has become a realistic goal. This keynote will provide an overview about the challenges to achieve such a goal, based on results from practical case studies like the VR-Together project. We will discuss about different technologies, like point clouds, that can be used as the format for representing highly-realistic digital humans, and about metrics and protocols for quantifying the quality of experience. The final intention of the talk is to shed some light on social VR, as a new group of virtual reality experiences based on social photorealistic immersive content and to discuss about the challenges regarding production, technology, and user-centric processes.

15:00-18:00 Session 10: Posters & demos (incl. coffee break)
15:00
HoloLucination: A Framework for Live Augmented Reality Presentations across Mobile Devices

ABSTRACT. We envision that in the future, presentations for business, education, and scientific dissemination can invoke 3D spatial content to immersively display and discuss animated 3-dimensional models and spatial data visualizations to large audiences. At the moment, current frameworks have targeted a highly technical user base, prohibiting the widespread curation of immersive presentations. Furthermore, solutions for real-time multi-user interactions have focused on multiplayer gaming, rather than large format immersive presentation. However, modern mobile devices (smartphones, tablets, headsets) have the capability of rendering virtual models over the physical environment through visual anchors for Augmented Reality (AR). Our ongoing research thrust is to leverage contemporary AR infrastructure to develop an easy-to-use tool for users to curate and spatially present augmented presentations to large audiences. In this demo, we have built an Augmented Reality framework that allows users to curate mixed reality presentations. Our framework allows users to prepare a sequential state of animations. At the time of presentation, presenters can invoke the animations to simultaneously occur on HMDs and mobile devices.

15:00
Drafting interpretation of Seismic Data through Virtual Reality with Hyperknowledge Base Systems

ABSTRACT. Seismic data are sources of information used by geophysics and geologist to infer the lithology of a region and look for evidence of possible hydrocarbon deposits. The interpretation of this data is critical for natural resources exploration in the business of industries like oil&gas. However, the essence of the data is volumetric, and the interpretation is challenging and time-consuming even for skilled domain specialists. In this work, we present a virtual reality system to explore seismic data assisted by a knowledge base and AI services. We focus on the aspect of visualizing and creating 3D annotations that are artifacts that highlight regions of interest that will characterize structures of the seismic data. A hybrid knowledge base (Hyperknowledge base), which support multimodal data, plays the role to integrate all those annotations from user to AI services and vice-versa. Hence, users shall use the system for decision making in immersive environments that preserve the volumetric perspective of the data for a better understanding of them.

15:00
Exploring CNN-based Viewport Prediction For Live Virtual Reality Streaming

ABSTRACT. Live virtual reality streaming (a.k.a., 360-degree video streaming) is gaining popularity recently with its rapid growth in the consumer market. However, the huge bandwidth required by the 360-degree frame size becomes the bottleneck, keeping this application from a wider range of deployment. Research efforts have been carried out to solve the bandwidth problem by predicting the user's viewport of interest and selectively streaming a part of the whole frame. However, currently most of the viewport prediction approaches cannot address the unique challenges in the live streaming scenario, where there is no historical user or video traces to build the prediction model. In this paper, we explore the opportunity of leveraging CNN to predict the user's viewport in live streaming by modifying the architecture of the CNN application and the training/testing progress. The evaluation results reveal that the CNN-based method could achieve a high prediction accuracy with low bandwidth usage and low timing overhead.

15:00
Extracting Specific Voice from Mixed Audio Source

ABSTRACT. We propose a deep neural network (DNN) for extracting a single speech signal from a mixture of sounds containing other speakers and background noise. To realize the proposed DNN, we introduce a new dataset comprised of multi-speakers and environment noises. We conduct a subjective evaluation for measuring the source separation quality of the DNN. Additionally, we compare the separation quality of models learned with different amounts of training data. As a result, we found there is no significant difference in the separation quality between 10 and 30 minutes of the target speaker's speech length for training data.

15:00
neomento SAD - VR treatment for social anxiety

ABSTRACT. We present the neomento project, a solution for virtual reality (VR) exposure therapy (VRET). In our work we have created specific rendering methods and virtual environments (VEs), designed and published a novel form of behavioural control for virtual agents (VA), included biophysiological measurements directly into the experience, and created an asymmetric gameplay system to assure the correct progress of a therapy session. This abstract is a reprint of [A. Streck, P.Stepnicka, J. Klaubert, and T. Wolbers. neomento - Towards Building a Universal Solution for Virtual Reality Exposure Psychotherapy. 2019 IEEE Conference on Games (CoG), IEEE Press, 2019] ©2019 IEEE, with the title, acknowledgement, references, and the Fig.1 updated.

15:00
Towards Method Time Measurement Identification Using Virtual Reality and Gesture Recognition

ABSTRACT. In this paper, we introduce a system for automatic generation of Methods-Time Measurement (MTM) Analyses using only head and both hands 3D tracking. Our approach relies on the division of gestures into small elementary movements. Then, we built a decision tree to aggregate these elementary movements in order to generate the realized MTM code. The proposed system does not need any pre-learning step, and it can be useful in both virtual environments to train technicians and in real cases with industrial workshops to assist experts for MTM code identification.

15:00
Vibration Feedback Controlled by Intuitive Pressing in Virtual Reality

ABSTRACT. To provide more immersive experience in VR, high-fidelity vibrotactile feedback is one of the most important task to make VR user can feel virtual objects. In this work, we propose a mobile-based vibrotactile feedback system named FingerVIP, which provides an intuitive and efficient way for VR application/game designers to input proper vibration configuration of each target vibrotactile feedback. Our system uses pressure sensors attached on fingers as the controllers to manipulate the vibration configuration, including amplitude, frequency, and time duration. We utilized the proposed FingerVIP to set three kinds of vibrotactile feedback in a VR sports game and validated that FingerVIP successfully helped game designers reduce the number of iteration and the time for configuring vibration.

15:00
Extended Abstract: Augmented Reality for Human-Robot Cooperation in Aircraft Assembly

ABSTRACT. This extended abstract and the accompanying demonstration video show how Augmented Reality (AR) can be used in an industrial setting to coordinate a hybrid team consisting of a human worker and two robots in order to rivet stringers and ties to an aircraft hull.

15:00
VRescuer: A Virtual Reality Application for Disaster Response Training

ABSTRACT. With the advancement of modern technologies, Virtual Reality plays an essential role in training rescuers, particularly for disaster savers employing simulation training. By wholly immersed in the virtual environment, rescuers are capable of practicing the required skills without being threatened of their lives before experiencing the real world situation. This paper presented a work-in-progress Virtual Reality application called VRescuer to help trainees get used to various disaster circumstances. A scenario of a city was created with an ambulance rescuer and several rescuees in the scene. The intelligent ambulance rescuer was introduced as a rescuer/guider to automatically search and find the optimal paths for saving all rescuees. The trainee can interfere in the rescuing process by placing obstacles or adding more rescuees along with the ways which cause the rescue agent to re-route the paths. The VRescuer was implemented in Unity3D with an Oculus Rift device, and it was assessed by five volunteers to improve the proposed application.

15:00
Assessing the Value of 3D Software Experience with Camera Layout in Virtual Reality

ABSTRACT. Preproduction is a critical step in creating 3D animated content for film and TV. The current process is slow, costly, and creatively challenging, forcing the layout director (LD) to interpret and create 3D worlds and camera directions from 2D drawings. Virtual reality (VR) offers the potential to make the process faster, cheaper, and more accessible. We conducted a user study evaluating the effectiveness of VR as a preproduction aid, specifically focusing on prior 3D modeling experience as an independent variable. We compared the performance of experienced Maya professionals to those with no Maya experience. Participants were tasked with laying out the camera to set up a shot for an animated scene. Our results revealed that the Maya Experienced (ME) participants did not significantly outperform the No Maya Experience (NME) counterparts. Overall, our study suggests that VR may provide an effective platform for animation pre-production, “leveling the playing field” for users with limited 3D software experience.

15:00
An Affective Computing in Virtual Reality Environments for Managing Surgical Pain and Anxiety

ABSTRACT. Pain and anxiety are common accompaniments of surgery. About 90% of people indicate elevated levels of anxiety during pre-operative care, and 66% of the people report moderate to high levels of pain immediately after surgery. Currently, opioids are the primary method for pain management during postoperative care, and approximately one in 16 surgical patients prescribed opioids becomes a long-term user. This, along with the current opioid epidemic crisis calls for alternative pain management mechanisms. This research focuses on utilizing affective computing techniques to develop and deliver an adaptive virtual reality experience based on the user's physiological response to reduce pain and anxiety. Biofeedback is integrated with a virtual environment utilizing the user’s heart rate variability, respiration, and electrodermal activity. Early results from Total Knee Arthroplasty patients undergoing surgery at Patewood Memorial Hospital in Greenville, SC demonstrate promising results in the management of pain and anxiety during pre and post-operative care.

15:00
Creating an Immersed Sheep and Wool VR/AI Experience

ABSTRACT. The development of a sheep and wool conservancy initiative, “Shave ‘em to Save ‘em,” inspired the creation of a Virtual Reality/ Artificial Intelligence game “For Ewe,” a proof-of-concept game. It aims to support agricultural and sustainable design initiatives. The game development process was guided by Activity Theory and was created using Unity 3D. “For Ewe” was tested on web, Samsung Gear VR, and Oculus Quest virtual reality platforms.

15:00
Towards Semantic Action Analysis via Emergent Language

ABSTRACT. Recent work on unsupervised learning have explored the feasibility of semantic analysis and interpretation via Emergent Language (EL) models. As EL requires some form of numerical embedding, it remains unclear which type is required in order for the EL to properly capture certain semantic concepts associated with a given task. In this paper, we compare different approaches to generate embeddings: unsupervised and supervised. We start by producing a large dataset using a single agent simulator environment. In these experiments, a purpose driven agent attempts to accomplish a number of tasks. These tasks are performed in a synthetic city scape environment, which includes houses, banks, theaters and restaurants. Given such experiences, specification of the associated goal structure constitutes a narrative. We investigate the feasibility of producing an EL from such data with the hope that such EL description may allow for the inference of the underlying narrative structure. Our initial experiments show that supervised approach towards constructing the embedding function results in high accuracy with respect to narrative inference, while the unsupervised approach results in greater ambiguity regarding the narrative structure of the data.

15:00
AR Tracking with Hybrid, Agnostic And Browser Based Approach
PRESENTER: Amit Ahire

ABSTRACT. Mobile platform tools are desirable when it comes to practical augmented reality applications. With the convenience and portability that the form factor has to offer, it lays an ideal basic foundation for a feasible use case in industry and commercial applications. Here, we present a novel approach of using the monocular Simultaneous Localization and Mapping (SLAM) information provided by a Cross-Reality (XR) device to augment the linked 3D CAD models. The main objective is to use the tracking technology for an augmented and mixed reality experience by tracking a 3D model and superimposing its respective 3D CAD model data over the images we receive from the camera feed of the XR device without any scene preparation (e.g markers or feature maps). The intent is to conduct a visual analysis and evaluations based on the intrinsic and extrinsic of the model in the visualization system. To achieve this we make use of the Apple’s ARKit to obtain the images, sensor data and SLAM heuristic of client XR device, remote marker-less model based 3D object tracking from monocular RGB image data and hybrid client server architecture. Our approach is agnostic of any SLAM system or Augmented Reality (AR) framework. We make use of the Apple’s ARKit because of the its ease of use, affordability, stability and maturity as a platform and as an integrated system.

15:00
Binarization Using Morphological Decomposition Followed by cGAN

ABSTRACT. This paper presents a novel binarization algorithm for stained decipherable patterns. First, the input image is downsized, of which the reduction ratio is determined by iteratively applying binary morphological Closing operation. Such morphology-driven image downsizing not only saves the computation time of subsequent processes, but the key features necessary for the successful decoding is preserved. Then, high or low contrast areas are decomposed by applying the grayscale morphological Closing and Opening operators to the downsized image, and subtracting the two resulting output images from each other. If necessary, these areas are further subjected to decomposition to obtain finer separation of high and low regions. Having done the preprocessing, two approaches are proposed to do the binarization: (1) GMM is used to estimate a binarization threshold for each region (2) the binarization problem is treated as an image-translation task and hence a deep learning approach based on the conditional generative adversarial network (cGAN) is trained using the high or low contrast areas as conditional inputs. Our method solves the difficulty of choosing a proper preset sampling mask in conventional adaptive thresholding methods. Extensive experimental results show that the binarization algorithm can efficiently improve the decipher success rate over the other methods.

15:00
Using CNNs for Users Segmentation in Video See-Through Augmented Virtuality

ABSTRACT. In this paper, we investigate the use of deep learning techniques to integrate the user’s self-body and other participants into a head-mounted video see-through augmented virtuality scenario. It has been previously shown that seeing user’s bodies in such simulations may improve the feeling of both self and social presence in the virtual environment, as well as user performance. We propose to use a convolutional neural network for real time semantic segmentation of users’ bodies in the stereoscopic RGB video streams acquired from the perspective of the user. We describe design issues as well as implementation details of the system and demonstrate the feasibility of using such neural networks for merging users’ bodies in an augmented virtuality simulation.

15:00
Intent Inference of Human Hand Motion for Haptic Feedback Systems

ABSTRACT. The haptic feedback system (HFS) in the virtual cockpit system (VCS) can definitely enhance the sense of immersion. Most HFSs in prior works sacrificed the native advantages of VCSs to achieve haptic interaction. This paper addresses the problem by proposing a novel framework for the HFS, which can predict the most likely interacting target of the human hand in advance. We introduce a HFS with a non-contact visual tracking sensor and a probability inference method based on Bayesian statistics, the features extracted by this HFS could be low-cost, high generality and flexibility. Simulations show that human intent inference can be computed in real-time and the results can meet the requirements of the HFM, which provides an important basis for haptic interactions in VCSs.

15:00
DRONEVR: A virtual reality simulator for Drone Operator

ABSTRACT. In recent years, Unmanned Aerial Vehicle (UAV) has been used extensively in various applications from entertainment, virtual tourism to construction, mining, agriculture. Navigation, path planning, and image acquisition are the main tasks in administering these aerial devices. Aircraft crash is one of the most critical issues due to the uncontrolled environment and signal loss that cause the aerial vehicle to hit the buildings on its returning mode. This paper proposes a prototype embedded in a Web-based application called DroneVR to mitigate the aforementioned issues. The virtual reality environment was reconstructed based on the real-world fly data (OpenStreetMap) in which path planning and navigation were carried out. Perceived ease of use was investigated with a small sample size users to improve the simulator

15:00
Creating Virtual Reality and Augmented Reality development in classroom: Is it a hype?

ABSTRACT. The fast-growing of the number of high-performance computer processor and hand-held devices have paved the way for the development of Virtual Reality and Augmented Reality, especially in the educational sector. The question of whether students can adopt these new technologies is not fully addressed. Answering this question thus plays an essential role for instructors and course designers. The objectives of this study are: (1) to investigate the feasibility of the Virtual Reality/Augmented Reality development for undergraduate students, and (2) to highlight some practical challenges when creating and sharing Virtual Reality and Augmented Reality applications from student's perspective. Study design for the coursework was given with detail. During a 16-week-long, 63 Virtual Reality/Augmented Reality applications were created from a variety of topics and various development tools. 43 survey questions are prepared and administered to students for each phase of the projects to address technical difficulties. The exploration method was used for data analysis.

15:00
Realtime Behavior-Based Continual Authentication of Users in Virtual Reality Environments
PRESENTER: Ashwin Ajit
15:00
Live Emoji: A Live Storytelling VR System with Programmable Cartoon-style Emotion Embodiment
15:00
An Interactive Demonstration of Collaborative VR for Laparoscopic Liver Surgery Training
PRESENTER: Vuthea Chheang

ABSTRACT. We introduce a collaborative virtual reality (VR) system for planning and simulation in laparoscopic liver surgery training. Patient image data is used for surgical model visualization and simulation. We developed two modes for training in laparoscopic procedures: exploration and surgery mode. Surgical joysticks are used in surgery mode to provide training for psychomotor skills and cooperation between a camera assistant and an experienced surgeon. Continuous feedback from our clinical partner comprised an important part of the development. Our evaluation showed that surgeons were positive about the usability and usefulness of the developed system. For further details, please refer to our full article and additional materials.

15:00
Explore Convolutional Neural Networks in Virtual Reality
PRESENTER: Nadine Meissler
15:00
Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video
PRESENTER: Alisha Sharma
15:45-16:15Coffee break