2.5D Real-Time Detection on Voxelized Indoor Scene Point Clouds

YAHYOZODA, NASRULLOH

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

2.5D Real-Time Detection on Voxelized Indoor Scene Point Clouds

Author(s): YAHYOZODA, NASRULLOH

Advisor: Yang, Seungjoon

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/90956 http://unist.dcollection.net/common/orgView/200000965879

Abstract: Indoor scene object detection from 3D point clouds remains computationally intensive despite significant advances in deep learning architectures. Existing approaches—whether voxel-based, point-based, or transformer-based—face inherent trade-offs between detection accuracy and com- putational efficiency, limiting their applicability in real-time scenarios such as robotics and aug- mented reality. This thesis introduces a novel top-view data representation and 2.5D detection framework that achieves substantial computational efficiency gains while maintaining competitive detection accuracy. The core innovation lies in a carefully designed data representation: a Bird’s Eye View (BEV) top-view density projection that compresses 3D voxelized point clouds into 2D density maps encoding vertical occupancy information. This representation preserves essential geometric characteristics for object detection while reducing computational complexity from cubic to quadratic scaling. Building upon this representation, we adapt the YOLOv11 architecture for processing top- view density maps, achieving 123 frames per second (FPS) inference speed with minimal GPU memory footprint of 1.61 GB. Furthermore, we demonstrate that training on a unified dataset combining five benchmark indoor scene datasets with balanced sampling substantially improves generalization performance across diverse indoor environments. Experimental results show that our approach achieves 10× speedup over existing point-based methods and 5–6× improvement over efficient voxel methods, combined with 77% reduction in memory consumption, while maintaining competitive detection accuracy. Keywords: Indoor scene understanding, 3D object detection, point cloud processing, real-time detection, bird’s eye view representation, deep learning.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Department of Electrical Engineering

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.