mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Bickson <danny.bick...@gmail.com>
Subject Re: LDA on single node is much faster than 20 nodes
Date Tue, 06 Sep 2011 08:04:32 GMT
I suggest taking a look at my blog post:
http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html
There could be many potential reasons among them:
* Premature timeouts which makes task fails just before they should finish
* Bad configuration of numbers of mappers/reducers - too few or too many may
significantly slow down things.
* Possible use of compression may speed disk access
I think you will have to "get your hands dirty" by analyzing the logs and
finding out what slows you down.

On Tue, Sep 6, 2011 at 10:35 AM, Chris Lu <clu@atypon.com> wrote:

> Hi,
>
> I am running LDA on 18k documents, each document has 5k terms. total 300k
> terms. Topics is set to 100.
>
> Running LDA on Hadoop single node configuration takes about 5 hours per
> stage. And 20 stages would take 100 hours.
>
> However, given 20 machines, running on Amazon EMR is actually much much
> slower. It takes 1000 minutes per stage. (It takes about 10 minutes for 1%
> mapping progress.) Reducing is much faster is counted in seconds, almost
> neglect-able.
>
> Does anyone has similar experience or my setup is wrong?
>
> Chris
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message