spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib
Date Tue, 21 Oct 2014 00:24:33 GMT


Sean Owen commented on SPARK-4001:

FWIW I do perceive Apriori to be *the* basic frequent itemset algorithm. I think this is the
original paper -- at least it was on Wikipedia and looks like the right time / author:
 It is very simple, and probably what you'd cook up if you invented a solution to the problem:

Frequent itemset is not quite the same as a frequent item algorithm. From a bunch of sets
of items, it tries to determine which subsets occur frequently.

FP-Growth is the other itemset algorithm I have ever heard of. It's more sophisticated. I
don't have a paper reference.

If you're going to implement frequent itemsets, I think these are the two to start with. That
said I perceive frequent itemsets to be kind of "90s" and I have never had to use it myself.
That is not to say they don't have use, and hey they're simple. I suppose my problem with
this type of technique is that it's not really telling you whether the set occurred unusually
frequently, just that it did in absolute terms. There is not a probabilistic element to these.

> Add Apriori algorithm to Spark MLlib
> ------------------------------------
>                 Key: SPARK-4001
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jacky Li
>            Assignee: Jacky Li
> Apriori is the classic algorithm for frequent item set mining in a transactional data
set.  It will be useful if Apriori algorithm is added to MLLib in Spark

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message