File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Assessing Vision-Language Models for Failure Detection in Robotic Manipulation

Author(s)
Chowdhury, Md Sameer IqbalAu, Tsz-Chiu
Issued Date
2026-01
DOI
10.1109/MPULS.2026.3659245
URI
https://scholarworks.unist.ac.kr/handle/201301/91611
Citation
IEEE PULSE, v.17, no.1, pp.63 - 65
Abstract
Vision-language models (VLMs) offer transformative potential for robotics, but their deployment is constrained by performance limitations. In safety-critical manipulation, a model must recognize its own limitations to prevent a catastrophic failure. We conduct a systematic study of VLMs for robotic failure detection, evaluating six architectures on real-world trajectories. We put forward a decision-making process that allows a VLM to evaluate whether it can successfully complete a task, and if not, pause its operation and hand over the task to human operators. Our results show that well-calibrated VLMs can be trustworthy partners that know exactly when to ask for help.
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
ISSN
2154-2287
Keyword (Author)
Foundation modelsRobotsVision language modelFailure analysisSemantic communicationVisual perceptionSafetyMonte Carlo methodsHeuristic algorithmsEnd effectorsBayes methodsCalibrationMedical roboticsComputer architecture

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.