File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

안혜민

Ahn, Hyemin
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

VAFOR: Proactive Voice Assistant for Object Retrieval in the Physical World

Author(s)
Satyev, BekatanAhn, Hyemin
Issued Date
2023-08-28
DOI
10.1109/ro-man57019.2023.10309466
URI
https://scholarworks.unist.ac.kr/handle/201301/74573
Citation
IEEE International Conference on Robot and Human Interactive Communication
Abstract
In this paper, we present a proactive robotic voice assistant with a perceive-reason-act loop that carries out pick-and-place operations based on verbal commands. Unlike existing systems, our robot can retrieve a target object not only when the target is explicitly spelled out, but also given an indirect command that implicitly reflects the human intention or emotion. For instance, when the verbal command is “I had a busy day, so I didn’t have much to eat.”, the target object would be something that can help with hunger. To successfully estimate the target object from indirect commands, our framework consists of separate modules for the complete perceive-reason-act loop as follows. First, for perception, it runs an object detector on the robot’s onboard computer to detect all objects in the surroundings and records a verbal command from a microphone. Second, for reasoning, a list of available objects as well as a transcription of the verbal command are integrated into a prompt for a Large Language Model (LLM) in order to identify the target object in the command. Finally, for action, a TurtleBot3 with a 5 DOF robotic arm finds the target object and brings it to the human. Our experiments show that with a properly designed prompt, the robot can identify the correct target object from implicit commands with at most 97% accuracy. In addition, it is shown that the technique of fine-tuning a language model based on the proposed prompt designing process amplifies the performance of the smallest language model by a factor of five. Our data and code are available at https://github.com/bekatan/vafor
Publisher
IEEE

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.