mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gardner <gard...@pythian.com>
Subject Re: LDA/CVB Performance
Date Thu, 13 Jun 2013 16:59:20 GMT
Andy, Jake,

Thanks for the quick reply!

I can definitely understand about the multi-core Map tasks, and I agree
with your assessment. I was mostly curious about why Mahout would default
to so many threads for the training pool if they're going to contend for a
single core.

I've been sizing the splits so they create ~48 map tasks, but I can see how
those splits might be imbalanced. Looking at the time per map task, some
took 4 times as long as others. I'll try to offset this by creating 4 or 5
times more splits so the cluster stays more evenly utilized.

I'll try dropping some low frequency terms as well and see how it performs.



On Thu, Jun 13, 2013 at 12:43 PM, Andy Schlaikjer <
andrew.schlaikjer@gmail.com> wrote:

> Hi Alan,
>
> On Thu, Jun 13, 2013 at 8:54 AM, Alan Gardner <gardner@pythian.com> wrote:
>
> > The weirdest behaviour I'm seeing is that the multithreaded training Map
> > task only utilizes one core on an eight core node. I'm not sure if this
> is
> > configurable in the JVM parameters or the job config. In the meantime
> I've
> > set the input split very small, so that I can run 8 parallel 1-thread
> > training mappers per node. Should I be configuring this differently?
> >
>
> At my office it's generally frowned upon to run MR tasks which attempt to
> make use of lots of cores on a multicore system, due to cluster
> configuration which forces number of map / reduce slots to sum to num
> cores. If multiple multi-threaded task attempts run on the same node, CPU
> load may spike and negatively affect performance of all task attempts on
> the node.
>
>
> > I also wanted to check in and verify that the performance I'm seeing is
> > typical:
> >
> > - on a six-node cluster (48 map slots, 8 cores per node) running full
> tilt,
> > each iteration takes about 7 hours. I assume the problem is just that our
> > cluster is far too small, and that the performance will scale if I make
> the
> > splits even smaller and distribute the job across more nodes.
> >
>
> How many input splits are generated for your input doc-term matrix? In each
> task attempt, how many rows are processed? Make sure input is balanced
> across all map tasks.
>
>
> > - with an 8GB heap size I can't exceed about 200 topics before running
> out
> > of heap space. I tried making the Map input smaller, but that didn't seem
> > to help. Can someone describe how memory usage scales per mapper in terms
> > of topics, documents and terms?
> >
>
> The tasks need memory proportional to num topics x num terms. Do you have a
> full 8 GB heap for each task slot?
>
> Cheers,
> Andy
>
> Twitter, Inc.
>



-- 
Alan Gardner
Solutions Architect - CTO Office

gardner@pythian.com | LinkedIn:
http://www.linkedin.com/profile/view?id=65508699 |
@alanctgardner<https://twitter.com/alanctgardner>
Tel: +1 613 565 8696 x1218
Mobile: +1 613 897 5655

-- 


--




Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message