mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Schlaikjer <andrew.schlaik...@gmail.com>
Subject Re: LDA/CVB Performance
Date Thu, 13 Jun 2013 20:35:59 GMT
Sebastian, there is one read-only topic x term matrix and another copy
which receives updates. Certainly, sharing the read-only matrix would be
beneficial.


On Thu, Jun 13, 2013 at 1:00 PM, Sebastian Schelter <ssc@apache.org> wrote:

> This table is readonly, right? We could try to apply the trick from our
> ALS code: Instead of running one mapper per core (and thus having one
> copy of the table per core), run a multithreaded mapper and share the
> table between its threads. Works very well for ALS. We can also cache
> the table in a static variable and make Hadoop reuse JVMs, which
> increases performance if the number of blocks to process is larger than
> the number of map slots.
>
> -sebastian
>
> On 13.06.2013 21:56, Ted Dunning wrote:
> > On Thu, Jun 13, 2013 at 6:50 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> >
> >> Andy, note that he said he's running with a 1.6M-term dictionary.
>  That's
> >> going
> >> to be 2 * 200 * 1.6M * 8B = 5.1GB for just the term-topic matrices.
> Still
> >> not hitting
> >> 8GB, but getting closer.
> >>
> >
> > It will likely be even worse unless this table is shared between mappers.
> >  With 8 mappers per node, this goes to 41GB.  The OP didn't mention
> machine
> > configuration, but this could easily cause swapping.
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message