Boosting Small Language Models in Robotics Task Planning via LLMs as a Data Generator

Choi, Gawon

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Boosting Small Language Models in Robotics Task Planning via LLMs as a Data Generator

Author(s): Choi, Gawon

Advisor: Ahn, Hye-Min

Issued Date: 2025-02

URI: https://scholarworks.unist.ac.kr/handle/201301/86453 http://unist.dcollection.net/common/orgView/200000865605

Abstract: In robotics, Large Language Models (LLMs) have been noted as task planners for robots due to their advanced perception and reasoning capabilities based on Chain-of-Thought (CoT). However, we claim that LLMs are over-specified for robotic task planning for two reasons. First, the language commands given to robots have much lower linguistic complexity than what LLMs can handle. Second, modern robots in practice are primarily tailored to a specific environment (i.e., tabletop, kitchen), as opposed to LLMs, which are domain-agnostic. We further believe that within a specific environment, small language models (LMs) have the potential to be effective robot task planners. To demonstrate this, we introduce a comprehensive framework that covers the entire workflow of training small LMs as environment-specific task planners, from generating datasets for the task planning to fine-tuning small LMs on these datasets, based on knowledge distillation [1]. We refer to the synthetic dataset generated from this framework as the COmmand-STeps Dataset (COST), containing commands to robots and corresponding actionable plans to execute the commands. In this framework, both data collection via LLMs and post-processing are automated, allowing anyone to build their own COST dataset for any environment. We generate the COST datasets for the kitchen and tabletop environments as examples, and evaluate their effectiveness by comparing the task planning performance of LLMs and fine-tuned small LMs on the COST dataset. As a result, we find that fine-tuned GPT2-medium performs comparable with GPT3.5 in both environments.

Publisher: Ulsan National Institute of Science and Technology

Degree: Master

Major: Graduate School of Artificial Intelligence

Show Full Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.