Class FPGrowth

java.lang.Object
smile.association.FPGrowth
All Implemented Interfaces:
Iterable<ItemSet>

public class FPGrowth extends Object implements Iterable<ItemSet>
Frequent item set mining based on the FP-growth (frequent pattern growth) algorithm, which employs an extended prefix-tree (FP-tree) structure to store the database in a compressed form. The FP-growth algorithm is currently one of the fastest approaches to discover frequent item sets. FP-growth adopts a divide-and-conquer approach to decompose both the mining tasks and the databases. It uses a pattern fragment growth method to avoid the costly process of candidate generation and testing used by Apriori.

The basic idea of the FP-growth algorithm can be described as a recursive elimination scheme: in a preprocessing step delete all items from the transactions that are not frequent individually, i.e., do not appear in a user-specified minimum number of transactions. Then select all transactions that contain the least frequent item (least frequent among those that are frequent) and delete this item from them. Recurse to process the obtained reduced (also known as projected) database, remembering that the item sets found in the recursion share the deleted item as a prefix. On return, remove the processed item from the database of all transactions and start over, i.e., process the second frequent item etc. In these processing steps the prefix tree, which is enhanced by links between the branches, is exploited to quickly find the transactions containing a given item and also to remove this item from the transactions after it has been processed.

References

  1. Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. Mining frequent patterns without candidate generation. Data Mining and Knowledge Discovery 8:53-87, 2004.
  2. Gosta Grahne and Jianfei Zhu. Fast algorithms for frequent itemset mining using FP-trees. IEEE TRANS. ON KNOWLEDGE AND DATA ENGINEERING 17(10):1347-1362, 2005.
  3. Christian Borgelt. An Implementation of the FP-growth Algorithm. OSDM, 1-5, 2005.
  • Method Details

    • size

      public int size()
      Returns the number transactions in the database.
      Returns:
      the number transactions in the database
    • iterator

      public Iterator<ItemSet> iterator()
      Specified by:
      iterator in interface Iterable<ItemSet>
    • apply

      public static Stream<ItemSet> apply(FPTree tree)
      Mines the frequent item sets.
      Parameters:
      tree - the FP-tree of item sets.
      Returns:
      the stream of frequent item sets.