Feature structure distillation with Centered Kernel Alignment in BERT transferring

Jung, Hee-Jun; Kim, Doyeon; Na, Seung-Hoon; Kim, Kangil

doi:10.1016/j.eswa.2023.120980

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Related Researcher

나승훈

Na, Seung-Hoon: Natural Language Processing Lab

Read More

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.startPage	120980	-
dc.citation.title	EXPERT SYSTEMS WITH APPLICATIONS	-
dc.citation.volume	234	-
dc.contributor.author	Jung, Hee-Jun	-
dc.contributor.author	Kim, Doyeon	-
dc.contributor.author	Na, Seung-Hoon	-
dc.contributor.author	Kim, Kangil	-
dc.date.accessioned	2025-04-25T15:10:52Z	-
dc.date.available	2025-04-25T15:10:52Z	-
dc.date.created	2025-04-08	-
dc.date.issued	2023-12	-
dc.description.abstract	Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve the problems, we propose a novel method feature structure distillation that elaborates information on structures of features into three types for transferring, and implements them based on Centered Kernel Analysis. In particular, the global local-inter structure is proposed to transfer the structure beyond the mini-batch. In detail, the method first divides the feature information into three structures: intra-feature, local inter-feature, and global inter-feature structures to subdivide the structure and transfer the diversity of the structure. Then, we adopt CKA which shows a more accurate similarity metric compared to other metrics between two different models or representations on different spaces. In particular, a memory-augmented transfer method with clustering is implemented for the global structures. The methods are empirically analyzed on the nine tasks for language understanding of the GLUE dataset with Bidirectional Encoder Representations from Transformers (BERT), which is a representative neural language model. In the results, the proposed methods effectively transfer the three types of structures and improves performance compared to state-of-the-art distillation methods: (i.e.) ours achieve 66.61% accuracy compared to the baseline (65.55%) in the RTE dataset. Indeed, the code for the methods is available at https://github.com/maroo-sky/FSD.	-
dc.identifier.bibliographicCitation	EXPERT SYSTEMS WITH APPLICATIONS, v.234, pp.120980	-
dc.identifier.doi	10.1016/j.eswa.2023.120980	-
dc.identifier.issn	0957-4174	-
dc.identifier.scopusid	2-s2.0-85166359505	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/86772	-
dc.identifier.wosid	001120981200001	-
dc.language	영어	-
dc.publisher	PERGAMON-ELSEVIER SCIENCE LTD	-
dc.title	Feature structure distillation with Centered Kernel Alignment in BERT transferring	-
dc.type	Article	-
dc.description.isOpenAccess	FALSE	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence; Engineering, Electrical & Electronic; Operations Research & Management Science	-
dc.relation.journalResearchArea	Computer Science; Engineering; Operations Research & Management Science	-
dc.type.docType	Article	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	Knowledge distillation	-
dc.subject.keywordAuthor	BERT	-
dc.subject.keywordAuthor	Centered Kernel Alignment	-
dc.subject.keywordAuthor	Natural language processing	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.