File Download

There are no files associated with this item.

  • Find it @ UNIST can give you direct access to the published full text of this article. (UNISTARs only)

Views & Downloads

Detailed Information

Cited time in webofscience Cited time in scopus
Metadata Downloads

A hybrid similarity measure based on binary and decimal data for data mining

Author(s)
Jeong, Soyeong
Issued Date
2019-04-19
DOI
10.1145/3330482.3330520
URI
https://scholarworks.unist.ac.kr/handle/201301/79975
Fulltext
https://dl.acm.org/citation.cfm?doid=3330482.3330520
Citation
5th International Conference on Computing and Artificial Intelligence, ICCAI 2019, pp.72 - 77
Abstract
We suggest a new similarity measure to improve the quality of data mining, especially for recommender system. A similarity measure is widely used for classification, clustering, anomaly detection and so on. Many recommender systems predict unrated score through clustering similar users. This method is so called collaborative filtering(CF), which is being widely used. In CF, how to define a similarity measure is a major concern. Conventional measures based on Pearson Correlation Coefficient(PCC) are hard to reflect the implicit and explicit information at the same time. We propose a hybrid similarity measure, named BD PCC, which is a type of PCC, named after the first letter of ‘Binary’ and ‘Decimal’ types respectively. As we suggest from its name, BD PCC is defined by concatenating two PCCs on two different types of data. Although other hybrid measures need some processes to concatenate, BD PCC is free from scale issue. Because it consists of both PCCs unlike other hybrid measures consisting of values in different ranges. Since PCC for binary data can be defined if the user bought at least one item, BD PCC relieves the sparsity of data. We tested the proposed similarity measure in recommender systems and the prediction accuracy has been improved for real data sets, MovieLens 100K[8], MovieLens 1M[8], MovieLens latest small[8], and FilmTrust 35K[9]. © 2019 Association for Computing Machinery.
Publisher
Association for Computing Machinery
ISSN
0000-0000

qrcode

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.