Refining Action Segmentation with Hierarchical Video Representations

Ahn, Hyemin; Lee, D.

doi:10.1109/ICCV48922.2021.01599

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

안혜민

Ahn, Hyemin

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	ZZ	-
dc.citation.conferencePlace	Virtual, Online	-
dc.citation.endPage	16290	-
dc.citation.startPage	16282	-
dc.citation.title	IEEE International Conference on Computer Vision	-
dc.contributor.author	Ahn, Hyemin	-
dc.contributor.author	Lee, D.	-
dc.date.accessioned	2024-01-31T21:36:18Z	-
dc.date.available	2024-01-31T21:36:18Z	-
dc.date.created	2022-06-08	-
dc.date.issued	2021-10-11	-
dc.description.abstract	In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal action segmentation results from various models by understanding the overall context of a given video in a hierarchical way. When a backbone model for action segmentation estimates how the given video can be segmented, our model extracts segment-level representations based on frame-level features, and extracts a video-level representation based on the segment-level representations. Based on these hierarchical representations, our model can refer to the overall context of the entire video, and predict how the segment labels that are out of context should be corrected. Our HASR can be plugged into various action segmentation models (MS-TCN, SSTDA, ASRF), and improve the performance of state-of-the-art models based on three challenging datasets (GTEA, 50Salads, and Breakfast). For example, in 50Sal-ads dataset, the segmental edit score improves from 67.9% to 77.4% (MS-TCN), from 75.8% to 77.3% (SSTDA), from 79.3% to 81.0% (ASRF). In addition, our model can refine the segmentation result from the unseen backbone model, which was not referred to when training HASR. This generalization performance would make HASR be an effective tool for boosting up the existing approaches for temporal action segmentation. Our code is available at https://github.com/cotton-ahn/HASR_iccv2021. © 2021 IEEE	-
dc.identifier.bibliographicCitation	IEEE International Conference on Computer Vision, pp.16282 - 16290	-
dc.identifier.doi	10.1109/ICCV48922.2021.01599	-
dc.identifier.issn	1550-5499	-
dc.identifier.scopusid	2-s2.0-85126767233	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/76945	-
dc.language	영어	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Refining Action Segmentation with Hierarchical Video Representations	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2021-10-11	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.