A hybrid similarity measure based on binary and decimal data for data mining

Jeong, Soyeong

doi:10.1145/3330482.3330520

Scholarworks@UNIST

UNIST Library

File Download

There are no files associated with this item.

SFX Link

Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience

Cited time in scopus

Metadata Downloads

Full metadata record

DC Field	Value	Language
dc.citation.conferencePlace	IO	-
dc.citation.conferencePlace	Bali	-
dc.citation.endPage	77	-
dc.citation.startPage	72	-
dc.citation.title	5th International Conference on Computing and Artificial Intelligence, ICCAI 2019	-
dc.contributor.author	Jeong, Soyeong	-
dc.date.accessioned	2024-02-01T00:36:40Z	-
dc.date.available	2024-02-01T00:36:40Z	-
dc.date.created	2019-09-06	-
dc.date.issued	2019-04-19	-
dc.description.abstract	We suggest a new similarity measure to improve the quality of data mining, especially for recommender system. A similarity measure is widely used for classification, clustering, anomaly detection and so on. Many recommender systems predict unrated score through clustering similar users. This method is so called collaborative filtering(CF), which is being widely used. In CF, how to define a similarity measure is a major concern. Conventional measures based on Pearson Correlation Coefficient(PCC) are hard to reflect the implicit and explicit information at the same time. We propose a hybrid similarity measure, named BD PCC, which is a type of PCC, named after the first letter of ‘Binary’ and ‘Decimal’ types respectively. As we suggest from its name, BD PCC is defined by concatenating two PCCs on two different types of data. Although other hybrid measures need some processes to concatenate, BD PCC is free from scale issue. Because it consists of both PCCs unlike other hybrid measures consisting of values in different ranges. Since PCC for binary data can be defined if the user bought at least one item, BD PCC relieves the sparsity of data. We tested the proposed similarity measure in recommender systems and the prediction accuracy has been improved for real data sets, MovieLens 100K[8], MovieLens 1M[8], MovieLens latest small[8], and FilmTrust 35K[9]. © 2019 Association for Computing Machinery.	-
dc.identifier.bibliographicCitation	5th International Conference on Computing and Artificial Intelligence, ICCAI 2019, pp.72 - 77	-
dc.identifier.doi	10.1145/3330482.3330520	-
dc.identifier.issn	0000-0000	-
dc.identifier.scopusid	2-s2.0-85071121648	-
dc.identifier.uri	https://scholarworks.unist.ac.kr/handle/201301/79975	-
dc.identifier.url	https://dl.acm.org/citation.cfm?doid=3330482.3330520	-
dc.language	영어	-
dc.publisher	Association for Computing Machinery	-
dc.title	A hybrid similarity measure based on binary and decimal data for data mining	-
dc.type	Conference Paper	-
dc.date.conferenceDate	2019-04-19	-

Show Simple Item Record

qrcode

RSS 1.0 RSS 2.0

UNIST | Library

Tel : 052-217-1403 / Email : scholarworks@unist.ac.kr

ScholarWorks@UNIST was established as an OAK Project for the National Library of Korea.