mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Frequent itemset mining
Date Wed, 06 Jun 2012 06:00:35 GMT
It wouldn't surprise me, though I don't know this implementation or
your setup. Locally, you're not really running Hadoop -- it's all
local, and there is no HDFS to replicate and such. You are saving the
big overhead of shuffling data across machines, and the overhead of
starting new workers. For small input, the overhead can indeed be most
of the run time.

On Wed, Jun 6, 2012 at 3:19 AM, Alex Kozlov <alexvk@cloudera.com> wrote:
> The documentation says:
>
> Running parallel FPGrowth is as easy as adding changing the flag -method
> mapreduce and adding the number of groups parameter e.g. -g 20 for 20
> groups. First, let's run the above sample test in map-reduce mode:
>
> bin/mahout fpg \
>     -i core/src/test/resources/retail.dat \
>     -o patterns \
>     -k 50 \
>     -method mapreduce \
>     -regex '[\ ]' \
>     -s 2
>
>  The above test took 102 seconds on dual-core laptop, v.s. 609 seconds in
> the sequential mode, (with 5 gigs of ram allocated). In a separate test,
> the first 1000 lines of retail.dat took 20 seconds in map/reduce v.s. 30
> seconds in sequential mode.
>
> Running the example above I get times more like hours (both sequential and
> mapreduce methods) on a 48GB boxes.  Am I doing something wrong?  Should it
> be minutes instead of seconds?
> --
> Alex K
>
> On Mon, Dec 5, 2011 at 12:50 PM, Isabel Drost <isabel@apache.org> wrote:
>
>> On 02.12.2011 Tom Pierce wrote:
>> > These programs are actually exposed though the main mahout program; if
>> you
>> > run:
>> >
>> > $MAHOUT_HOME/bin/mahout fpg
>> >
>> > it will run the Frequent Pattern Growth algorithm (aka frequent itemset
>> > mining).
>>
>> Also there is quite some documentation on the wiki:
>>
>> https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html(also
>> includes a link to the original research publication).
>>
>> Isabel
>>
>>

Mime
View raw message