MonoXiver: New AI Algorithm Converts 2D Photos into 3D Maps

baoshi.rao

MonoXiver is an AI-based method developed by Xianpeng Liu's team at North Carolina State University to extract 3D information from 2D images. It only requires a standard monocular camera to construct reliable 3D maps of the surrounding environment. This holds great significance for autonomous vehicle perception and navigation.

Since photos are 2D representations of the 3D world, they lack depth information such as the actual size of objects and their distance from the camera. This poses a significant challenge for navigation using 2D cameras. Currently, a common approach is to combine cameras with LiDAR, which measures distance by emitting laser beams. However, such systems are costly, and the hardware is difficult to integrate into vehicles.

MonoXiver employs a step-by-step approach. First, it uses a monocular camera to make a rough estimate of the 3D geometry in the image. Then, it annotates important objects like vehicles with 3D bounding boxes. These boxes help represent various scales, aspect ratios, and orientations of objects in the scene. Initially, the box positions are based on the camera's estimates. Using these boxes as starting points, MonoXiver reanalyzes the areas within the boxes, constructing smaller boxes to capture more details.

MonoXiver can also distinguish overlapping elements within the boxes. Finally, it calibrates whether the predicted detail boxes align with the shapes, colors, and textures within the overall boxes. Tests on large-scale image datasets show that this method can accurately construct 3D maps.

The research team expects this method to transform the ability of AI systems, such as autonomous vehicles, to perceive and navigate 3D spaces. Additionally, MonoXiver is highly adaptable and can easily be integrated with different monocular cameras. Beyond autonomous driving, this AI method can also be applied to other fields, such as robotics, environmental monitoring, and medical imaging.