File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

공태식

Gong, Taesik
Ubiquitous AI Lab
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Full metadata record

DC Field Value Language
dc.citation.conferencePlace US -
dc.citation.title Empirical Methods in Natural Language Processing -
dc.contributor.author Yoon, Hyungjun -
dc.contributor.author Tolera, Biniyam Aschalew -
dc.contributor.author Gong, Taesik -
dc.contributor.author Lee, Kimin -
dc.contributor.author Lee, Sung-Ju -
dc.date.accessioned 2024-12-02T12:05:06Z -
dc.date.available 2024-12-02T12:05:06Z -
dc.date.created 2024-11-30 -
dc.date.issued 2024-11-12 -
dc.description.abstract Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8×. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks. The source code is available at https://github. com/diamond264/ByMyEyes. -
dc.identifier.bibliographicCitation Empirical Methods in Natural Language Processing -
dc.identifier.uri https://scholarworks.unist.ac.kr/handle/201301/84658 -
dc.publisher EMNLP -
dc.title By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting -
dc.type Conference Paper -
dc.date.conferenceDate 2024-11-12 -

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.