Vision-language models (VLMs) offer transformative potential for robotics, but their deployment is constrained by performance limitations. In safety-critical manipulation, a model must recognize its own limitations to prevent a catastrophic failure. We conduct a systematic study of VLMs for robotic failure detection, evaluating six architectures on real-world trajectories. We put forward a decision-making process that allows a VLM to evaluate whether it can successfully complete a task, and if not, pause its operation and hand over the task to human operators. Our results show that well-calibrated VLMs can be trustworthy partners that know exactly when to ask for help.