mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neal Richter <nrich...@gmail.com>
Subject Re: FPGrowth maybe not producing the right rules
Date Thu, 15 Apr 2010 02:57:04 GMT
Here are some itemset miners to test against:

http://www.borgelt.net/fpgrowth.html
http://www.borgelt.net/eclat.html
http://www.borgelt.net/apriori.html

If you run his fpgrowth against your toy data with a min support of 2
(of course you can do this on paper as you have done).

1  (50.0%/3)
1 5  (33.3%/2)
1 5 2  (33.3%/2)
1 5 2 3  (33.3%/2)
1 5 3  (33.3%/2)
1 2  (33.3%/2)
1 2 3  (33.3%/2)
1 3  (50.0%/3)
5  (66.7%/4)
5 2  (66.7%/4)
5 2 3  (50.0%/3)
5 3  (50.0%/3)
2  (66.7%/4)
2 3  (50.0%/3)
3  (66.7%/4)

So yes, it looks like your mahout code has an issue...

- Neal

On Wed, Apr 14, 2010 at 4:33 PM, Robert Neumayer <neumayer@idi.ntnu.no> wrote:
> Hi.
>
> I was trying to test the fpgrowth algorithm on a toy example and I kind of
> fail
> to get the correct results ... so it'd be nice if someone could take a look at
> my (admittedly very ad-hoc) code and tell me whether there's something
> wrong with it. I'm running the mahout svn version.
>
> I tried to run the following example (input):
>
> 1 3 4
> 2 3 5
> 1 2 3 5
> 2 5
> 1 2 3 5
>
> With the following code:
>  public static void main(String[] args) {
>        int minSupport = 2;
>        int maxHeapSize = 100;
>
>        String input = "inputfile";
>        String output = "output";
>
>        FPGrowth<String> fp = new FPGrowth<String>();
>        FileSystem fs = new RawLocalFileSystem();
>        Configuration conf = new Configuration();
>
>        String pattern = "[\\ ]";
>        try {
>            fs = FileSystem.get(URI.create(output), conf);
>
>            SequenceFile.Writer writer = null;
>            writer = new SequenceFile.Writer(fs, conf, new Path(output),
> Text.class, TopKStringPatterns.class);
>
>            Charset encoding = Charset.forName("UTF-8");
>
>            List<Pair<String, Long>> generateFList = null;
>            generateFList = fp.generateFList(new StringRecordIterator(new
> FileLineIterable(new File(input), encoding, false),
>                    pattern), minSupport);
>
>            StringRecordIterator transactions = null;
>
>            transactions = new StringRecordIterator(new FileLineIterable(new
> File(input), encoding, false), pattern);
>
>
>            List<Text> keyList = new LinkedList<Text>();
>            List<TopKStringPatterns> valueList = new
> LinkedList<TopKStringPatterns>();
>
>            StringOutputCollector<Text, TopKStringPatterns> collector = new
> StringOutputCollector<Text, TopKStringPatterns>(
>                    keyList, valueList);
>            StringOutputConverter soc = new StringOutputConverter(collector);
>
>            StatusUpdater updater = new StatusUpdater() {
>                @Override
>                public void update(String status) {
>                    System.out.println(status);
>                }
>            };
>
>            fp.generateTopKFrequentPatterns(transactions, generateFList,
> minSupport, maxHeapSize, null, soc, updater);
>            writer.close();
>            fs.close();
>
>            System.out.println("list.size: " + valueList.size());
>            HashSet<List<String>> unique = new HashSet<List<String>>();
>            for (int i = 0; i < valueList.size(); i++) {
>
>                System.out.println(keyList.get(i) + " / " + valueList.get(i));
>                Iterator<Pair<List<String>, Long>> iterator
=
> valueList.get(i).iterator();
>                while (iterator.hasNext()) {
>
>                    unique.add(iterator.next().getFirst());
>                }
>            }
>            // print a bit more clearly
>            Iterator<List<String>> iterator = unique.iterator();
>            while (iterator.hasNext()) {
>                System.out.println(iterator.next());
>            }
>
>        } catch (IOException e) {
>            e.printStackTrace();
>        }
>    }
>
> I wrote another collector class (I know, that was probably not necessary but I
> wanted to use the API a bit):
>
> public class StringOutputCollector<K extends Writable, V extends Writable>
> implements OutputCollector<K, V> {
>
>    private final List<K> keyList;
>
>
>    private final List<V> valueList;
>
>    public StringOutputCollector(List<K> keyList, List<V> valueList) {
>        this.keyList = keyList;
>        this.valueList = valueList;
>    }
>
>    @Override
>    public final void collect(K key, V value) throws IOException {
>        keyList.add(key);
>        valueList.add(value);
>    }
>
> }
>
> The output I get is the following:
> 1 / ([1, 3],3), ([1, 2, 3, 5],2)
> 5 / ([2, 5],4), ([2, 3, 5],3), ([1, 2, 3, 5],2)
> 3 / ([3],4), ([2, 3, 5],3), ([1, 3],3), ([1, 2, 3, 5],2)
> 2 / ([2, 5],4), ([2, 3, 5],3), ([1, 2, 3, 5],2)
>
> But I think there's more frequent itemsets to find in these transactions (with
> min support 2):
>
> {1}     3
> {2}     4
> {3}     4
> {5}     4
> {1, 2}  2
> {1, 3}  3
> {1, 5}  2
> {2, 3}  3
> {2, 5}  4
> {3, 5}  3
> {1, 2, 3}       2
> {1, 2, 5}       3
> {1, 3, 5}       2
> {2, 3, 5}       3
> {1, 2, 3, 5}    2
>
> Also, some of them are duplicates with the mahout code. I'm not sure whether
> it's supposed to be like that.
>
> I also tried another textbook example and that didn't produce the same results
> either.
>
> So am I doing something wrong with the topk selection (that's what I maybe
> could think of as a stupid mistake of mine)?
>
> Any hints would be appreciated.
>
> R
>

Mime
View raw message