Tags:3D World-Building, Collision Detection, Convolutional Neural Networks, Deep Learning, Depth Estimation, Graphics Rendering, Human-in-the-loop, Inverse Projection, Modeling and Virtual World
Abstract:
Three-dimensional (3D) modeling of urban road scenes has warranted well-deserved interest in entertainment, urban planning, and autonomous vehicle simulation. Modeling such realistic scenes is still predominantly a manual process, relying mainly on 3D artists. Cameras mounted on vehicles can now provide images of road scenes, which can be used as references for automating scene layout. Our goal is to use the information from such images from a single camera sensor on a moving vehicle to build an approximate 3D virtual world. We propose a workflow that takes the human out of the loop through the use of deep learning to generate a dense depth map, an inverse projection to correct for perspective distortion in the image, collision detection, and a rendering engine. The engine loads and displays 3D models belonging to a particular type, at accurate relative positions, thus building and rendering a virtual world corresponding to the image. This virtual world can then be edited and animated. Our proposed workflow can potentially speed up the process of modeling virtual environments significantly when integrated with a modeling tool. We have tested the efficacy of our 3D virtual world-building and rendering using user studies with image-to-image similarity and video-to-image correspondences. Even with limited photorealistic rendering, our user study results demonstrate that 3D world-building can be effectively done, with minimal human intervention, using our workflow with monocular images from moving vehicles as inputs.
Building 3D Virtual Worlds from Monocular Images of Urban Road Traffic Scenes