21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2017, v.10526 LNAI, pp.138 - 148
Abstract
Finding frequent itemsets is a popular data mining problem, aiming to extract hidden patterns from a transactional database. Several bio-inspired approaches to solve this problem have been proposed to overcome the poor performance of exact algorithms, such as Apriori and FPGrowth. Approaches based on genetic algorithms are among the most efficient ones from the point of view of runtime performance, but they are still inefficient in terms of solution’s quality, i.e., the number of frequent itemsets discovered. To deal with this issue, we propose in this paper a new genetic algorithm for finding frequent itemsets called GA-Apriori, in which the crossover and mutation operators are defined by taking into account the Apriori heuristic principle. The results of our evaluation show that GA-Apriori outperforms other approaches to frequent itemset mining based on genetic algorithms, especially when dealing with large instances. The experiments also show that GA-Apriori is competitive with exact approaches in terms of the number of frequent itemsets discovered.
Publisher
21st Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2017