In recent autonomous driving research, there has been growing interest in End-to-End (E2E) methodolo- gies that learn optimal driving behavior directly from raw sensor data. Unlike conventional rule-based modular approaches, E2E methods employ a single integrated model to optimize the driving pipeline, with all parameters jointly updated via a single backward pass. This work presents two independent contributions aimed at real-world E2E driving: (i) enabling em- bedded, real-time deployment via a lightweight architecture with training-only stability enhancements and (ii) improving intersection reliability by strengthening signal awareness in the policy. For real-world applicability, E2E autonomous driving requires models capable of real-time, closed- loop operation on constrained onboard hardware, demanding a balance between high performance and computational efficiency. Recent work, focusing on performance, has limited deployment feasibility due to model complexity and high latency. Additionally, reliance on a pure E2E model’s control prediction presents stability challenges, as unaddressed system latency can lead to unsafe and misaligned actions. To address deployment feasibility and closed-loop stability, we introduce an embedded-friendly, camera- only sensor-to-control framework. Our model learns a spatially explicit scene via an auxiliary Bird’s- Eye-View (BEV) segmentation task and a lightweight hybrid temporal module. We address efficient modeling of planning and control by proposing a unified decoder rather than a cascaded one. In addition, we introduce a training-only module that enforces plan-control consistency and applies a latency-aware adaptive delay labeling policy, dynamically shifting supervision targets to enhance closed-loop safety. Evaluated in the MORAI simulator, our system runs at 16 Hz on an NVIDIA Jetson AGX Orin, achieving driving scores of 86.67 (in-distribution) and 73.27 (out-of-distribution), outperforming real-time, sensor- to-control baselines. Our approach ranked first in the Hyundai Motor Group 2025 E2E Autonomous Driving Challenge. Furthermore, E2E autonomous driving jointly optimizes a sensor-to-control policy, but brittleness at signalized intersections remains: few-pixel traffic-light cues gate stop/go decisions and small mis- reads can amplify into unsafe closed-loop actions. We propose TLCAM, a Traffic Light Classification Auxiliary Module, that injects dense supervision on traffic-signal states so the shared encoder allocates representation capacity to these safety-critical pixels with only 0.45% additional parameters and 0.02,ms runtime overhead. In MORAI, TLCAM yields safer closed-loop behavior–raising red-light stop success from 89.22% to 94.66% and reducing stopping-distance variance by 53%–while maintaining navigation path accuracy (ADE/FDE) and improving lane keeping (CTE 0.17 vs. 0.20). These results show that a minimal auxiliary task can substantially harden intersection reliability without sacrificing the global- optimality promise of E2E learning.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence Artificial Intelligence