mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tom pierce <...@apache.org>
Subject Re: Frequent itemset mining
Date Wed, 06 Jun 2012 22:55:52 GMT
I found some very non-intuitive performance behavior in FPG when I was
trying it out, though I never quite tracked down why it was happening. 

I actually wound up contributing an alternate implementation; would you
re-try your example and add the "-2" flag, which selects the other
implementation?  I'd be curious to hear if that resolves your issue.

Thanks,
-tom

On 06/06/2012 02:00 AM, Sean Owen wrote:
> It wouldn't surprise me, though I don't know this implementation or
> your setup. Locally, you're not really running Hadoop -- it's all
> local, and there is no HDFS to replicate and such. You are saving the
> big overhead of shuffling data across machines, and the overhead of
> starting new workers. For small input, the overhead can indeed be most
> of the run time.
>
> On Wed, Jun 6, 2012 at 3:19 AM, Alex Kozlov <alexvk@cloudera.com> wrote:
>> The documentation says:
>>
>> Running parallel FPGrowth is as easy as adding changing the flag -method
>> mapreduce and adding the number of groups parameter e.g. -g 20 for 20
>> groups. First, let's run the above sample test in map-reduce mode:
>>
>> bin/mahout fpg \
>>     -i core/src/test/resources/retail.dat \
>>     -o patterns \
>>     -k 50 \
>>     -method mapreduce \
>>     -regex '[\ ]' \
>>     -s 2
>>
>>  The above test took 102 seconds on dual-core laptop, v.s. 609 seconds in
>> the sequential mode, (with 5 gigs of ram allocated). In a separate test,
>> the first 1000 lines of retail.dat took 20 seconds in map/reduce v.s. 30
>> seconds in sequential mode.
>>
>> Running the example above I get times more like hours (both sequential and
>> mapreduce methods) on a 48GB boxes.  Am I doing something wrong?  Should it
>> be minutes instead of seconds?
>> --
>> Alex K
>>
>> On Mon, Dec 5, 2011 at 12:50 PM, Isabel Drost <isabel@apache.org> wrote:
>>
>>> On 02.12.2011 Tom Pierce wrote:
>>>> These programs are actually exposed though the main mahout program; if
>>> you
>>>> run:
>>>>
>>>> $MAHOUT_HOME/bin/mahout fpg
>>>>
>>>> it will run the Frequent Pattern Growth algorithm (aka frequent itemset
>>>> mining).
>>> Also there is quite some documentation on the wiki:
>>>
>>> https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html(also
>>> includes a link to the original research publication).
>>>
>>> Isabel
>>>
>>>


Mime
View raw message