mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Java Heap Error: ItemSimilarityJob
Date Wed, 06 Jun 2012 09:50:54 GMT
You need to increase the size of the children's heap. can be set to -Xmx4g for example. This is
usually put in mapred-site.xml.

Sampling does decrease the size of the intermediate outputs; probably
not the final output so much. But this is not your problem. You are
running out of heap on the workers.

You should definitely use more than one reducer! It's really up to
you, says Hadoop, to specify this, use -Dmapred.reduce.tasks=10 or
whatever is appropriate.

The name of the jobs kind of says what they do, and the javadoc says a
little more. If you have specific questions I bet people can explain


On Wed, Jun 6, 2012 at 7:39 AM, Something Something
<> wrote:
> Hello,
> I am running this job with a file containing 791,732,411  lines.
> Step 1 (PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer)  completed in
> 3 minutes.
> Step 2 (PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer) took 2 hours
> but completed successfully.  It used only 1 Reducer so I am assuming the
> output is sorted, right?
> Step 3 (PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer) failed
> after running for 54 minutes with 'Error: Java heap space' error  & it was
> all downhill from there.
> Question:  Are there any configuration parameters I can use to cut down
> size of output?  I noticed this in ToItemVectorsMapper:
> public static final String SAMPLE_SIZE = ToItemVectorsMapper.class +
> ".sampleSize";
> How do I cut down this sample size?
> Also, is there any documentation available that shows what each of these
> steps does?  If not, I will just debug.  Please let me know.  Thanks.

View raw message