2.5D Real-Time Detection on Voxelized Indoor Scene Point Clouds

YAHYOZODA, NASRULLOH

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Yang, Seungjoon	-
dc.contributor.author	YAHYOZODA, NASRULLOH	-
dc.date.accessioned	2026-03-26T22:13:55Z	-
dc.date.available	2026-03-26T22:13:55Z	-
dc.date.issued	2026-02	-
dc.description.abstract	Indoor scene object detection from 3D point clouds remains computationally intensive despite significant advances in deep learning architectures. Existing approaches—whether voxel-based, point-based, or transformer-based—face inherent trade-offs between detection accuracy and com- putational efficiency, limiting their applicability in real-time scenarios such as robotics and aug- mented reality. This thesis introduces a novel top-view data representation and 2.5D detection framework that achieves substantial computational efficiency gains while maintaining competitive detection accuracy. The core innovation lies in a carefully designed data representation: a Bird’s Eye View (BEV) top-view density projection that compresses 3D voxelized point clouds into 2D density maps encoding vertical occupancy information. This representation preserves essential geometric characteristics for object detection while reducing computational complexity from cubic to quadratic scaling. Building upon this representation, we adapt the YOLOv11 architecture for processing top- view density maps, achieving 123 frames per second (FPS) inference speed with minimal GPU memory footprint of 1.61 GB. Furthermore, we demonstrate that training on a unified dataset combining five benchmark indoor scene datasets with balanced sampling substantially improves generalization performance across diverse indoor environments. Experimental results show that our approach achieves 10× speedup over existing point-based methods and 5–6× improvement over efficient voxel methods, combined with 77% reduction in memory consumption, while maintaining competitive detection accuracy. Keywords: Indoor scene understanding, 3D object detection, point cloud processing, real-time detection, bird’s eye view representation, deep learning.	-
dc.description.degree	Master	-
dc.description	Department of Electrical Engineering	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/90956	-
dc.identifier.uri	http://unist.dcollection.net/common/orgView/200000965879	-
dc.language	ENG	-
dc.publisher	Ulsan National Institute of Science and Technology	-
dc.subject	Photoacoustic	-
dc.title	2.5D Real-Time Detection on Voxelized Indoor Scene Point Clouds	-
dc.type	Thesis	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.