File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

2.5D Real-Time Detection on Voxelized Indoor Scene Point Clouds

Author(s)
YAHYOZODA, NASRULLOH
Advisor
Yang, Seungjoon
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/90956 http://unist.dcollection.net/common/orgView/200000965879
Abstract
Indoor scene object detection from 3D point clouds remains computationally intensive despite significant advances in deep learning architectures. Existing approaches—whether voxel-based, point-based, or transformer-based—face inherent trade-offs between detection accuracy and com- putational efficiency, limiting their applicability in real-time scenarios such as robotics and aug- mented reality. This thesis introduces a novel top-view data representation and 2.5D detection framework that achieves substantial computational efficiency gains while maintaining competitive detection accuracy. The core innovation lies in a carefully designed data representation: a Bird’s Eye View (BEV) top-view density projection that compresses 3D voxelized point clouds into 2D density maps encoding vertical occupancy information. This representation preserves essential geometric characteristics for object detection while reducing computational complexity from cubic to quadratic scaling. Building upon this representation, we adapt the YOLOv11 architecture for processing top- view density maps, achieving 123 frames per second (FPS) inference speed with minimal GPU memory footprint of 1.61 GB. Furthermore, we demonstrate that training on a unified dataset combining five benchmark indoor scene datasets with balanced sampling substantially improves generalization performance across diverse indoor environments. Experimental results show that our approach achieves 10× speedup over existing point-based methods and 5–6× improvement over efficient voxel methods, combined with 77% reduction in memory consumption, while maintaining competitive detection accuracy. Keywords: Indoor scene understanding, 3D object detection, point cloud processing, real-time detection, bird’s eye view representation, deep learning.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Department of Electrical Engineering

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.