mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <>
Subject Re: LDA/CVB Performance
Date Thu, 13 Jun 2013 21:29:54 GMT
I'm not sure that the multithreaded trainer actually works very well,
sadly.  In my tests, I've never really been able to get it to churn through
much faster than with just one thread, probably because the message passing
is not done properly/efficiently (my own fault, I wrote it).

It certainly deserves a JIRA ticket, but frankly, the main reason I never
dug into it too much is because of what Andy said originally: it's really
pretty poor behavior on a shared cluster to hog lots of threads in one
mapper - it tricks the task tracking into thinking the machine has
resources to spare (only a few map slots being used!) but in reality lots
of cores are pinned with these extra threads.

In practice, my approach to speeding this up / handling memory constraints,
was to

  a) aim toward a sparser representation of the topic-term matrix (easy to
do for the read-only copy, a little harder to do for the updates)
  b) do online learning (currently implemented, but not well tested: if you
notice the checks in the code regarding "if (modelWeight == 1.0) {
writeModel = readModel; ...", i.e. allow your updating to get aggregated
onto your "read-only" topic-term matrix.  This in practice ends up being
similar to Hoffmann et. al's online LDA, but of course it's distributed, so
the merging process has to deal with the fact that different  mappers
drifted apart, and if this drift is too high... it doesn't converge well.
 Like I said - not too well tested, there are some kinks to work out to get
that work work correctly in all cases.

I'm not too much of a fan of stealing control of the whole box - my local
hadoop admin would really not like me. :)  The real golden implementation
would be not hugely memory constrained (so you could run it with only 3GB
per mapper), and not using more than one thread per mapper, yet let you run
with hundreds or thousands of mappers, even with millions of terms and
hundreds of topics (and of course as many documents as you had time to
throw at them)

The current implementation isn't *quite* there for my desired goals as
mentioned, however.

On Thu, Jun 13, 2013 at 1:45 PM, Sebastian Schelter <> wrote:

> I looked into the LDA code, it uses a multithreaded trainer, so we
> shouldn't need the trick I described.
> Have you tried playing with the "num_train_threads" option?
> -sebastian
> On 13.06.2013 22:35, Andy Schlaikjer wrote:
> > Sebastian, there is one read-only topic x term matrix and another copy
> > which receives updates. Certainly, sharing the read-only matrix would be
> > beneficial.
> >
> >
> > On Thu, Jun 13, 2013 at 1:00 PM, Sebastian Schelter <>
> wrote:
> >
> >> This table is readonly, right? We could try to apply the trick from our
> >> ALS code: Instead of running one mapper per core (and thus having one
> >> copy of the table per core), run a multithreaded mapper and share the
> >> table between its threads. Works very well for ALS. We can also cache
> >> the table in a static variable and make Hadoop reuse JVMs, which
> >> increases performance if the number of blocks to process is larger than
> >> the number of map slots.
> >>
> >> -sebastian
> >>
> >> On 13.06.2013 21:56, Ted Dunning wrote:
> >>> On Thu, Jun 13, 2013 at 6:50 PM, Jake Mannix <>
> >> wrote:
> >>>
> >>>> Andy, note that he said he's running with a 1.6M-term dictionary.
> >>  That's
> >>>> going
> >>>> to be 2 * 200 * 1.6M * 8B = 5.1GB for just the term-topic matrices.
> >> Still
> >>>> not hitting
> >>>> 8GB, but getting closer.
> >>>>
> >>>
> >>> It will likely be even worse unless this table is shared between
> mappers.
> >>>  With 8 mappers per node, this goes to 41GB.  The OP didn't mention
> >> machine
> >>> configuration, but this could easily cause swapping.
> >>>
> >>
> >>
> >



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message