This is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, this tool can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors.
Link: