Robust Robot Task Planning Through Failure Detection from Multi-View Scene Graphs

Chong, Haechan

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Robust Robot Task Planning Through Failure Detection from Multi-View Scene Graphs

Author(s): Chong, Haechan

Advisor: Joo, Kyungdon

Issued Date: 2026-02

URI: https://scholarworks.unist.ac.kr/handle/201301/91063 http://unist.dcollection.net/common/orgView/200000965097

Abstract: The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) into robotic task planners for failure detection has shown considerable promise, primarily due to their advanced semantic reasoning. However, a significant limitation of these models is that they typically operate under the assumption of comprehensive environmental comprehension. This assumption proves problematic in complex scenarios where an explicit model of object relationships and scene structure is absent, often resulting in unreliable planning and execution. To address this deficiency, this research introduces a novel framework grounded in multi-view scene understanding. The proposed method starts by capturing comprehensive environmental data via multi-view images. From this visual input, local 2D scene graphs are generated, each encoding object identities and their spatial or semantic relations. Subsequently, a graph neural network model is employed to aggregate and merge these disparate local 2D scene graphs into a single cohesive unified scene graph. This graph serves as the central data structure for identifying the success of task execution and diagnosing the root causes of failures. The failure detection mechanism operates by comparing the generated unified scene graph against an expected scene graph from the LLM during the initial planning phase of each sub-task. Discrepancies between these two graphs are used to identify the failure reasoning. This diagnostic information is then transmitted to the LLM, which uses the feedback to generate an effective revised plan. This closed-loop process significantly enhances adaptability and mitigates the occurrence of repetitive execution errors. The efficacy and applicability of the proposed framework are validated through empirical evaluation on five real-world benchmark tasks. In addition, a comparative analysis of the failure detection and reasoning is conducted against current methods. The results demonstrate the superior performance of our approach, highlighting the distinct advantages of integrating multi-view perception with explicit graph-based relational reasoning.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.