Learning-Based Visual Navigation for Drones: Generalization, Sample-Efficiency, and Sim-to-Real Gap

Kim, Minwoo

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Learning-Based Visual Navigation for Drones: Generalization, Sample-Efficiency, and Sim-to-Real Gap

Alternative Title: 드론을 위한 학습 기반 영상 항법: 일반화, 샘플 효용성, 시뮬레이션-실제 환경 간극

Author(s): Kim, Minwoo

Advisor: Oh, Hyondong

Issued Date: 2025-02

URI: https://scholarworks.unist.ac.kr/handle/201301/86444 http://unist.dcollection.net/common/orgView/200000865542

Abstract: Small unmanned aerial vehicles (UAVs) are widely used in various mission environments such as deliv- ery, search and rescue, and inspection thanks to their small size and agility, yet they require an appro- priate collision avoidance algorithm to operate safely. Various sensors are used for collision avoidance; among them, light detection and ranging (LiDAR), radar, and camera sensors are commonly employed. LiDAR offers accurate depth information and is robust to lighting conditions, making it suitable for both indoor and outdoor environments. Despite its advantages, equipping LiDAR sensors on small UAVs is challenging due to their heavy weight and high power consumption. Radar sensors can acquire en- vironmental information with relatively high accuracy even under harsh conditions such as inclement weather, dust, or fog, where optical sensors may struggle to perform effectively. However, their limited resolution makes navigating complex terrain challenging. Camera sensors provide rich visual informa- tion such as color and texture that can be leveraged for a variety of missions. Furthermore, unlike LiDAR and radar sensors, camera sensors consume much less power and have a low weight. Nevertheless, the high-dimensional nature of visual information requires advanced preprocessing algorithms for visual navigation tasks. Although each sensor has its own strengths and weaknesses, camera sensors are pre- dominantly employed in small UAVs due to their low power consumption and light weight. Vision-based navigation algorithms can be mainly categorized into model-based and learning-based approaches. Model-based algorithms have the advantage of being easier to interpret from an engineering perspective. Model-based approaches combined with multiple modules, making them relatively straight- forward to analyse each function. However, the modular structure of model-based approaches inherently makes them more vulnerable to sensor noise and prone to error accumulation, as latencies and inaccura- cies introduced by each module can gradually build up across the entire system. In contrast, learning-based methods reduce the number of modules or adopt an end-to-end approach by using neural networks, thereby minimizing error accumulation and latency issues. As they do not heavily rely on a modular design, learning-based methods tend to be more robust to sensor noise than model-based approaches. However, learning-based methods also have drawbacks. First, learning-based methods have low generalization performance: most of them are prone to be overfitted to the training dataset, which significantly reduces the performance in unseen environments. Secondly, learning-based approaches suffer from low sample efficiency, meaning they require large amounts of training data to achieve high generalization performance. This reliance on extensive datasets not only increases the de- mand for data collection but also leads to longer training times, making the learning process less effi- cient overall. Finally, the learning-based methods are vulnerable to sim-to-real gap. Most learning-based methods collect dataset from simulation environments. However, when trained models are applied in real-world scenarios, they suffer from a domain gap between simulation and real-world environments such as different dynamics, noise characteristics, and feature points from images. If these discrepancies are not properly addressed, it becomes difficult to guarantee the same level of performance in real-world settings as in the simulation environment. This thesis aims to address three main challenges: generalization, sample efficiency, and the sim-to- real gap. The proposed algorithm is designed to achieve high generalization performance, enable faster learning with fewer samples, and be directly applicable to real-world environments. To improve gen- eralization performance, we propose a reward function that ensures safety of UAVs while generating rapid collision-avoidance maneuvers. In addition, a neural network that accounts for temporal char- acteristics and embeds visual information is introduced to improve generalization and maintain stable learning. To address sample efficiency, an inverse reinforcement learning approach is employed, com- bining supervised and reinforcement learning to accelerate training. Lastly, to tackle the sim-to-real gap, domain-invariant feature extraction, generative neural network-based domain adaptation, and stereo vi- sion algorithms are incorporated. Although the proposed methods are trained on simulated data, their performance is validated in various indoor and outdoor environments. These experimental results in real- world scenarios provide evidence that the proposed methods achieve high generalization performance and address the sim-to-real gap.

Publisher: Ulsan National Institute of Science and Technology

Degree: Doctor

Major: Department of Mechanical Engineering

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.