File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

안혜민

Ahn, Hyemin
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace KO -
dc.citation.conferencePlace Busan, Korea, Republic of -
dc.citation.title IEEE International Conference on Robot and Human Interactive Communication -
dc.contributor.author Satyev, Bekatan -
dc.contributor.author Ahn, Hyemin -
dc.date.accessioned 2024-01-31T18:36:17Z -
dc.date.available 2024-01-31T18:36:17Z -
dc.date.created 2023-12-01 -
dc.date.issued 2023-08-28 -
dc.description.abstract In this paper, we present a proactive robotic voice assistant with a perceive-reason-act loop that carries out pick-and-place operations based on verbal commands. Unlike existing systems, our robot can retrieve a target object not only when the target is explicitly spelled out, but also given an indirect command that implicitly reflects the human intention or emotion. For instance, when the verbal command is “I had a busy day, so I didn’t have much to eat.”, the target object would be something that can help with hunger. To successfully estimate the target object from indirect commands, our framework consists of separate modules for the complete perceive-reason-act loop as follows. First, for perception, it runs an object detector on the robot’s onboard computer to detect all objects in the surroundings and records a verbal command from a microphone. Second, for reasoning, a list of available objects as well as a transcription of the verbal command are integrated into a prompt for a Large Language Model (LLM) in order to identify the target object in the command. Finally, for action, a TurtleBot3 with a 5 DOF robotic arm finds the target object and brings it to the human. Our experiments show that with a properly designed prompt, the robot can identify the correct target object from implicit commands with at most 97% accuracy. In addition, it is shown that the technique of fine-tuning a language model based on the proposed prompt designing process amplifies the performance of the smallest language model by a factor of five. Our data and code are available at https://github.com/bekatan/vafor -
dc.identifier.bibliographicCitation IEEE International Conference on Robot and Human Interactive Communication -
dc.identifier.doi 10.1109/ro-man57019.2023.10309466 -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/74573 -
dc.language 영어 -
dc.publisher IEEE -
dc.title VAFOR: Proactive Voice Assistant for Object Retrieval in the Physical World -
dc.type Conference Paper -
dc.date.conferenceDate 2023-08-28 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.