| dc.description.abstract |
In robotics, Large Language Models (LLMs) have been noted as task planners for robots due to their advanced perception and reasoning capabilities based on Chain-of-Thought (CoT). However, we claim that LLMs are over-specified for robotic task planning for two reasons. First, the language commands given to robots have much lower linguistic complexity than what LLMs can handle. Second, modern robots in practice are primarily tailored to a specific environment (i.e., tabletop, kitchen), as opposed to LLMs, which are domain-agnostic. We further believe that within a specific environment, small language models (LMs) have the potential to be effective robot task planners. To demonstrate this, we introduce a comprehensive framework that covers the entire workflow of training small LMs as environment-specific task planners, from generating datasets for the task planning to fine-tuning small LMs on these datasets, based on knowledge distillation [1]. We refer to the synthetic dataset generated from this framework as the COmmand-STeps Dataset (COST), containing commands to robots and corresponding actionable plans to execute the commands. In this framework, both data collection via LLMs and post-processing are automated, allowing anyone to build their own COST dataset for any environment. We generate the COST datasets for the kitchen and tabletop environments as examples, and evaluate their effectiveness by comparing the task planning performance of LLMs and fine-tuned small LMs on the COST dataset. As a result, we find that fine-tuned GPT2-medium performs comparable with GPT3.5 in both environments. |
- |