mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Java Heap Error: ItemSimilarityJob
Date Wed, 06 Jun 2012 11:23:37 GMT
This should be baked in by default. I don't think people use less that 4g
these days
On Jun 6, 2012 12:24 PM, "Vinod Singh" <vinod@vinodsingh.com> wrote:

> Child heap size can be increased by passing command line options as well.
> See the example given below-
>
> -Dmapred.map.child.java.opts=-Xmx6100m
> -Dmapred.reduce.child.java.opts=-Xmx6100m
>
> Thanks,
> Vinod
>
> http://blog.vinodsingh.com/
>
> On Wed, Jun 6, 2012 at 3:20 PM, Sean Owen <srowen@gmail.com> wrote:
>
> > You need to increase the size of the children's heap.
> > mapred.child.java.opts can be set to -Xmx4g for example. This is
> > usually put in mapred-site.xml.
> >
> > Sampling does decrease the size of the intermediate outputs; probably
> > not the final output so much. But this is not your problem. You are
> > running out of heap on the workers.
> >
> > You should definitely use more than one reducer! It's really up to
> > you, says Hadoop, to specify this, use -Dmapred.reduce.tasks=10 or
> > whatever is appropriate.
> >
> > The name of the jobs kind of says what they do, and the javadoc says a
> > little more. If you have specific questions I bet people can explain
> > here.
> >
> > Sean
> >
> >
> > On Wed, Jun 6, 2012 at 7:39 AM, Something Something
> > <mailinglists19@gmail.com> wrote:
> > > Hello,
> > >
> > > I am running this job with a file containing 791,732,411  lines.
> > >
> > > Step 1 (PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer)
>  completed
> > in
> > > 3 minutes.
> > >
> > > Step 2 (PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer) took 2
> > hours
> > > but completed successfully.  It used only 1 Reducer so I am assuming
> the
> > > output is sorted, right?
> > >
> > > Step 3 (PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer) failed
> > > after running for 54 minutes with 'Error: Java heap space' error  & it
> > was
> > > all downhill from there.
> > >
> > >
> > > Question:  Are there any configuration parameters I can use to cut down
> > > size of output?  I noticed this in ToItemVectorsMapper:
> > >
> > > public static final String SAMPLE_SIZE = ToItemVectorsMapper.class +
> > > ".sampleSize";
> > >
> > > How do I cut down this sample size?
> > >
> > > Also, is there any documentation available that shows what each of
> these
> > > steps does?  If not, I will just debug.  Please let me know.  Thanks.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message