mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stanley Xu <wenhao...@gmail.com>
Subject Re: how to get input in parallel FPGrowth
Date Tue, 24 May 2011 14:20:28 GMT
1. A job is killed is a normal behavior. Since by default, hadoop will
enable the speculative executions, which means it will create two attempts
for the same mapper and
once one of the attempt is done, it will just kill the one is not finished.

2. There are lots of possibilities that a mapper take much longer than
others. Maybe the input file is much larger, or the data in that mapper
might consume more CPU resource. Or the cluster node to handle the mapper is
in a heavy load. It is hard to say the root cause without the context.
You could try to check the inputs to figure out the reason, or simply re-run
the task to see if it still takes much longer time again.

BTW, post the question in the mahout mail list probably will get more
feedbacks and might be helpful to others has the same problem comparing to
send directly to me. :-)

Best wishes,
Stanley Xu



On Tue, May 24, 2011 at 3:15 PM, nn hust <nzjemail@gmail.com> wrote:

> Hi, when I use the pfp-growth , I met a question, I find the first map
> spend much more time then others, and there will be a task to be killed, I
> don't find any error info in the log file, do you know the cause?
>
> you can see the picture of the hadoop web tools I send to you.
>
>
> Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message