mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Starina <david.star...@gmail.com>
Subject Re: LDA - help me understand
Date Thu, 10 Mar 2016 13:24:35 GMT
How does memory requirement grow with the number of topics? A little
experimentation shows me that number of documents doesn't matter as much as
the number of topics ... Does the memory requirement grow exponentially
with the number of topics?

--David

On Thu, Mar 10, 2016 at 11:43 AM, David Starina <david.starina@gmail.com>
wrote:

> Hi,
>
> I realize MapReduce algorithms are not the "hot new stuff" anymore, but I
> am playing around with LDA. I have some problems with the memory, can you
> help me suggest how to set up parameters to make this work?
>
> I am running on a virtual cluster on my laptop - two nodes with 3 GB of
> memory each - just to prepare before I try this on a physical cluster with
> much larger data set. I am using a data set of 500 documents, averaging
> around 120 kB each, with roughly 60.000 terms. Running this with 20 topics
> runs ok - but when running on 100 topics, I ran out of memory (on the
> mappers). Can you suggest me how to set parameters, so it's going to run
> more mappers that will consume less memory?
>
> The error I get: Task Id : attempt_1457214584155_0074_m_000000_1, Status :
> FAILED
> *Container*
> [pid=26283,containerID=container_1457214584155_0074_01_000003] *is
> running beyond physical memory limits. Current usage: 1.0 GB of 1 GB
> physical memory used*; 1.7 GB of 2.1 GB virtual memory used. Killing
> container.
>
> This are the parameters I set for CVB0Driver:
>
> static int numTopics = 100;
> static double doc_topic_smoothening = 0.5;
> static double term_topic_smoothening = 0.5;
>
> static int maxIter = 3;
> static int iteration_block_size = 10;
> static double convergenceDelta = 0;
> static float testFraction = 0.0f;
> static int numTrainThreads = 4;
> static int numUpdateThreads = 1;
> static int maxItersPerDoc = 3;
> static int numReduceTasks = 10;
> static boolean backfillPerplexity = false;
>
> Any suggestion? Should I enlarge the container size on Hadoop, or can I fix this with
LDA parameters?
>
> Cheers,
> David
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message