mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rosario <uclamath...@gmail.com>
Subject Heap Space Issues with Complementary Naive Bayes
Date Fri, 04 May 2012 23:49:07 GMT
Hi,

I am trying to follow along with the 20 newsgroups example but using
my own data. I am running the examples on a server with 24GB of RAM
and 24 cores. When I get to the "Computing TF-IDF" stage, the whole
process fails with the following exception. I have 14000 documents and
2 classes. The lexicon consists of 2705284 trigrams which I created
myself. I then set the ng parameter equal to 1 since I already
tokenized the words myself.

The system at max has only been using 4-5GB total, and I have set
MAHOUT_OPTIONS=-Xmx4g, MAHOUT_HEAPSIZE=24000,
mapred.map.child.java.opts=-Xmx24g just to see if I could get Mahout
to acknowledge the increase in heap space, but this does not seem to
be helping at all.

What else can I try to get past this problem? The system has plenty of RAM.

Thanks,
Ryan

./bin/mahout trainclassifier -i /user/ryan/pageclass-train -o
pageclass-out -type cbayes -ng 1 -source

....
12/05/04 15:52:43 INFO cbayes.CBayesDriver: Calculating Tf-Idf...
12/05/04 15:52:46 INFO common.BayesTfIdfDriver: Counts of documents in
Each Label
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
       at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
       at java.lang.StringCoding.encode(StringCoding.java:272)
       at java.lang.String.getBytes(String.java:946)
       at org.apache.hadoop.io.DefaultStringifier.fromString(DefaultStringifier.java:73)
       at org.apache.mahout.classifier.bayes.mapreduce.common.BayesTfIdfDriver.runJob(BayesTfIdfDriver.java:88)
       at org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:51)
       at org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:58)
       at org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:151)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


--
RRR


-- 
RRR

Mime
View raw message