Hi all,
I am having a problem running the 20 newsgroup example in a hadoop cluster.
The trainclassifier worked fine but I got "out of memory java heap" problem
in the testclassifier.
The following is the configuration of the hadoop cluster.
Physical machines: 4 nodes, each with 6GB memory.
Hadoop: 0.20.2, HADOOP_HEAP_SIZE=3200 in hadoop-env.sh,
mapred.child.java.opts=-Xmx1024M in mapred-site.xml.
mahout: tried release 0.4 and the latest source, same problem.
Command line arguments used:
$MAHOUT_HOME/bin/mahout testclassifier \
-m newsmodel \
-d 20news-input \
-type bayes \
-ng 3 \
-source hdfs \
-method mapreduce
Any ideas ?
Thanks !
|