mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Java Heap Error: ItemSimilarityJob
Date Wed, 06 Jun 2012 06:39:50 GMT
Hello,

I am running this job with a file containing 791,732,411  lines.

Step 1 (PreparePreferenceMatrixJob-ItemIDIndexMapper-Reducer)  completed in
3 minutes.

Step 2 (PreparePreferenceMatrixJob-ToItemPrefsMapper-Reducer) took 2 hours
but completed successfully.  It used only 1 Reducer so I am assuming the
output is sorted, right?

Step 3 (PreparePreferenceMatrixJob-ToItemVectorsMapper-Reducer) failed
after running for 54 minutes with 'Error: Java heap space' error  & it was
all downhill from there.


Question:  Are there any configuration parameters I can use to cut down
size of output?  I noticed this in ToItemVectorsMapper:

public static final String SAMPLE_SIZE = ToItemVectorsMapper.class +
".sampleSize";

How do I cut down this sample size?

Also, is there any documentation available that shows what each of these
steps does?  If not, I will just debug.  Please let me know.  Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message