21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2017, pp.644 - 654
Abstract
The quest for frequent itemsets in a transactional database is explored in this paper, for the purpose of extracting hidden patterns from the database. Two major limitations of the Apriori algorithm are tackled, (i) the scan of the entire database at each pass to calculate the support of all generated itemsets, and (ii) its high sensitivity to variations of the minimum support threshold defined by the user. To deal with these limitations, a novel approach is proposed in this paper. The proposed approach, called Single Scan Frequent Itemsets Mining (SS-FIM), requires a single scan of the transactional database to extract the frequent itemsets. It has a unique feature to allow the generation of a fixed number of candidate itemsets, independently from the minimum support threshold, which intuitively allows to reduce the cost in terms of runtime for large databases. SS-FIM is compared with Apriori using several standard databases. The results confirm the scalability of SS-FIM and clearly show its superiority compared to Apriori for medium and large databases.