File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Robust Robot Task Planning Through Failure Detection from Multi-View Scene Graphs Haechan Chong Ulsan National Institute of Science and Technology

Author(s)
Chong, Haechan
Advisor
Joo, Kyungdon
Issued Date
2026-02
URI
https://scholarworks.unist.ac.kr/handle/201301/91063 http://unist.dcollection.net/common/orgView/200000965097
Abstract
The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) into robotic task planners for failure detection has shown considerable promise, primarily due to their advanced semantic reasoning. However, a significant limitation of these models is that they typically operate under the assumption of comprehensive environmental comprehension. This assumption proves problematic in complex scenarios where an explicit model of object relationships and scene structure is absent, often resulting in unreliable planning and execution. To address this deficiency, this research introduces a novel framework grounded in multi-view scene understanding. The proposed method starts by capturing comprehensive environmental data via multi-view images. From this visual input, local 2D scene graphs are generated, each encoding object identities and their spatial or semantic relations. Subsequently, a graph neural network model is employed to aggregate and merge these disparate local 2D scene graphs into a single cohesive unified scene graph. This graph serves as the central data structure for identifying the success of task execution and diagnosing the root causes of failures. The failure detection mechanism operates by comparing the generated unified scene graph against an expected scene graph from the LLM during the initial planning phase of each sub-task. Discrepancies between these two graphs are used to identify the failure reasoning. This diagnostic information is then transmitted to the LLM, which uses the feedback to generate an effective revised plan. This closed-loop process significantly enhances adaptability and mitigates the occurrence of repetitive execution errors. The efficacy and applicability of the proposed framework are validated through empirical evaluation on five real-world benchmark tasks. In addition, a comparative analysis of the failure detection and reasoning is conducted against current methods. The results demonstrate the superior performance of our approach, highlighting the distinct advantages of integrating multi-view perception with explicit graph-based relational reasoning.
Publisher
Ulsan National Institute of Science and Technology
Degree
Master
Major
Graduate School of Artificial Intelligence Artificial Intelligence

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.