mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: PFP Growth
Date Sat, 18 Sep 2010 18:30:24 GMT
In order to encourage your excellent practice of reposting, I will repeat my
(non)-answer here.

-------------------------------------------
I don't know the answer to this, but previously this kind of problem was
caused by highly skewed statistics in the input data.

If there are things that cooccur with everything, then you will have
problems with the speed of the algorithm.

Can you say something about the distribution of your data?  Can you post a
frequency by rank table?

On Sat, Sep 18, 2010 at 10:37 AM, Mark <static.void.dev@gmail.com> wrote:

>  I am trying to run FPGrowth:
>
> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5 -g
> 17500 -k 50/
>
> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth > reduce"/
> will not finish. It's basically stuck at 85% and hasn't budged in over an
> hour. The output of the first task outputted there were about 37K features
> so I set -g to 17500. Does anyone know whats going on and how I can speed
> this up?
>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message