File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)
Related Researcher

백승렬

Baek, Seungryul
UNIST VISION AND LEARNING LAB.
Read More

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

Transformer-based Unified Recognition of Two Hands Manipulating Objects

Author(s)
Cho, HoseongKim, ChanwooKim, JihyeonLee, SeongyeongElkhan, IsmayilzadaBaek, Seungryul
Issued Date
2023-06-20
URI
https://scholarworks.unist.ac.kr/handle/201301/74693
Citation
IEEE Conference on Computer Vision and Pattern Recognition
Abstract
Understanding the hand-object interactions from an egocentric video has received a great attention recently. So far, most approaches are based on the convolutional neural network (CNN) features combined with the temporal encoding via the long short-term memory (LSTM) or graph convolution network (GCN) to provide the unified understanding of two hands, an object and their interactions. In this paper, we propose the Transformer-based unified framework that provides better understanding of two hands manipulating objects. In our framework, we insert the whole image depicting two hands, an object and their interactions as input and jointly estimate 3 information from each frame: poses of two hands, pose of an object and object types. Afterwards, the action class defined by the hand-object interactions is predicted from the entire video based on the estimated information combined with the contact map that encodes the interaction between two hands and an object. Experiments are conducted on H2O and FPHA benchmark datasets and we demonstrated the superiority of our method achieving the state-of-the-art accuracy. Ablative studies further demonstrate the effectiveness of each proposed module.
Publisher
Institute of Electrical and Electronics Engineers Inc.

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.