mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gardner <gard...@pythian.com>
Subject Re: LDA/CVB Performance
Date Thu, 13 Jun 2013 20:09:07 GMT
Ted: Because the threads were all stuck on one core, there are 8 copies of
the table (1 per core). We have enough RAM to allocate one table per core,
but it limits our ability to scale to larger numbers of topics or terms
because we allocate an 8GB heap per mapper.

Sebastian: This mapper is supposed to be multi-threaded. In practice, on my
cluster each JVM only ever loaded one core, so I turned off the
multi-threading and split more map jobs. If the multi-threading will
alleviate memory pressure, that is perfect for my application. Can you
advise if there's special JVM config that needs to be done to make this
work?


On Thu, Jun 13, 2013 at 4:00 PM, Sebastian Schelter <ssc@apache.org> wrote:

> This table is readonly, right? We could try to apply the trick from our
> ALS code: Instead of running one mapper per core (and thus having one
> copy of the table per core), run a multithreaded mapper and share the
> table between its threads. Works very well for ALS. We can also cache
> the table in a static variable and make Hadoop reuse JVMs, which
> increases performance if the number of blocks to process is larger than
> the number of map slots.
>
> -sebastian
>
> On 13.06.2013 21:56, Ted Dunning wrote:
> > On Thu, Jun 13, 2013 at 6:50 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> >
> >> Andy, note that he said he's running with a 1.6M-term dictionary.
>  That's
> >> going
> >> to be 2 * 200 * 1.6M * 8B = 5.1GB for just the term-topic matrices.
> Still
> >> not hitting
> >> 8GB, but getting closer.
> >>
> >
> > It will likely be even worse unless this table is shared between mappers.
> >  With 8 mappers per node, this goes to 41GB.  The OP didn't mention
> machine
> > configuration, but this could easily cause swapping.
> >
>
>


-- 
Alan Gardner
Solutions Architect - CTO Office

gardner@pythian.com | LinkedIn:
http://www.linkedin.com/profile/view?id=65508699 |
@alanctgardner<https://twitter.com/alanctgardner>
Tel: +1 613 565 8696 x1218
Mobile: +1 613 897 5655

-- 


--




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message