Fidoop-Dp: Data Partitioning In Frequent Itemset Mining On Hadoop Clust

##plugins.themes.academic_pro.article.main##

P.Shunmugapriya
R.Maragatham

Abstract

Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of
computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining
algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead
induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning
approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the
performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data
partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive
Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an
excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of
datasets created by IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to
reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly
improves the performance of the existing parallel frequent-pattern scheme by up to 31% with an average of 18%.

##plugins.themes.academic_pro.article.details##

How to Cite
P.Shunmugapriya, & R.Maragatham. (2022). Fidoop-Dp: Data Partitioning In Frequent Itemset Mining On Hadoop Clust. IIRJET, 2(Special Issue ICEIET). Retrieved from https://iirjet.org/index.php/home/article/view/203 (Original work published June 13, 2022)