VAFOR: Proactive Voice Assistant for Object Retrieval in the Physical World

Satyev, Bekatan; Ahn, Hyemin

doi:10.1109/ro-man57019.2023.10309466

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

안혜민

Ahn, Hyemin

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

VAFOR: Proactive Voice Assistant for Object Retrieval in the Physical World

Author(s): Satyev, Bekatan, Ahn, Hyemin

Issued Date: 2023-08-28

DOI: 10.1109/ro-man57019.2023.10309466

URI: https://scholarworks.unist.ac.kr/handle/201301/74573

Citation: IEEE International Conference on Robot and Human Interactive Communication

Abstract: In this paper, we present a proactive robotic voice assistant with a perceive-reason-act loop that carries out pick-and-place operations based on verbal commands. Unlike existing systems, our robot can retrieve a target object not only when the target is explicitly spelled out, but also given an indirect command that implicitly reflects the human intention or emotion. For instance, when the verbal command is “I had a busy day, so I didn’t have much to eat.”, the target object would be something that can help with hunger. To successfully estimate the target object from indirect commands, our framework consists of separate modules for the complete perceive-reason-act loop as follows. First, for perception, it runs an object detector on the robot’s onboard computer to detect all objects in the surroundings and records a verbal command from a microphone. Second, for reasoning, a list of available objects as well as a transcription of the verbal command are integrated into a prompt for a Large Language Model (LLM) in order to identify the target object in the command. Finally, for action, a TurtleBot3 with a 5 DOF robotic arm finds the target object and brings it to the human. Our experiments show that with a properly designed prompt, the robot can identify the correct target object from implicit commands with at most 97% accuracy. In addition, it is shown that the technique of fine-tuning a language model based on the proposed prompt designing process amplifies the performance of the smallest language model by a factor of five. Our data and code are available at https://github.com/bekatan/vafor

Publisher: IEEE

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1404 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.