mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Pierce <...@cloudera.com>
Subject Re: Frequent itemset mining
Date Fri, 02 Dec 2011 19:30:05 GMT
These programs are actually exposed though the main mahout program; if you run:

$MAHOUT_HOME/bin/mahout fpg

it will run the Frequent Pattern Growth algorithm (aka frequent itemset mining).

Running the command above will show you what parameters are
required/available, including a switch to run in mapreduce or
sequential (i.e. single machine) mode.  Most params should be
straightforward, but this info may be helpful:

The input is expected to be plain text with one itemset per line.

The splitterPatter/regex will be used to split a line into itemsets;
it defaults to a 'comma with optional whitespace' pattern.

If you run in mapreduce mode, the "output" directory will have several
subdirs, you'll want to look in the frequentpatterns subdir and run:

$MAHOUT_HOME/bin/mahout seqdumper -s ${OUT}/frequentpatterns/part-r-00000

On each of the "part*" files in that directory to see the frequent patterns.

-tom

2011/12/2 戴清灏 <rogerdai16@gmail.com>:
> For a sequential implementation, fpgrowth.java might be the first.
> For a parallel implementation, pfpgrowth.java might be.
> there are 5 steps at total and 4 out of them are mapreduce.
>
> Sent from my mobile phone
> 在 2011-12-2 下午12:48,"Dave Fry" <dfry@upstreamsoftware.com>写道:
>
>> That would be fantastic, thank you!
>>
>> In the meantime, can you direct me to where in the source I should start
>> looking?  (ie, which class would be the entry point I'm looking for?)
>>
>> 2011/12/1 戴清灏 <rogerdai16@gmail.com>
>>
>> > There is actually a lack of the doc for the frequent pattern mining
>> usage.
>> > Actually, you are not the first one who claims the need of it.
>> > I will be pleased to write one for that usage since I've read almost the
>> > source code of it.
>> >
>> > 在 2011年12月2日星期五,Dave Fry 写道:
>> >
>> > > Hi!  I apologize for the newbie question, I'm just getting started with
>> > > Mahout.
>> > >
>> > > On the "Overview" page on Mahout's website:
>> > > https://cwiki.apache.org/confluence/display/MAHOUT/Overview
>> > >
>> > > It mentions this as the four primary targeted use cases for Mahout:
>> > > 1) Recommendation mining takes users' behavior and from that tries to
>> > find
>> > > items users might like.
>> > > 2) Clustering takes e.g. text documents and groups them into groups of
>> > > topically related documents.
>> > > 3) Classification learns from exisiting categorized documents what
>> > > documents of a specific category look like and is able to assign
>> > unlabelled
>> > > documents to the (hopefully) correct category.
>> > > 4) Frequent itemset mining takes a set of item groups (terms in a query
>> > > session, shopping cart content) and identifies, which individual items
>> > > usually appear together.
>> > >
>> > > But, based on the Mahout documentation that I've read through, I can't
>> > seem
>> > > to find a clear mapping from that use case description to where in the
>> > > Mahout distribution I should be looking.  I've found several leads for
>> > use
>> > > case #1, but #4 seems to be a bit of a mystery (and searches for
>> > "frequent
>> > > itemset mining" don't seem to lead me to where I need to go.)
>> > >
>> > > Basically, I'm looking to the answer to the question "Which items
>> appear
>> > > most often with item X in browse histories and shopping carts?".  (As
>> > > opposed to "Based on what I know about your preferences, here are the
>> > items
>> > > that I predict you would be most likely to browse/add to your cart".)
>> > >
>> > > Any help is appreciated!
>> > > Thanks,
>> > > Dave
>> > >
>> >
>> >
>> > --
>> > Regards,
>> > Q
>> >
>>

Mime
View raw message