mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Manuel Tirado <juanmanuel.tir...@gmail.com>
Subject Random forest testing fails
Date Thu, 10 Oct 2013 13:17:30 GMT
Hi there,

I'm following the steps from the Mahout wiki to run a classifier using
random forests (
https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation).
My data set contains 5M observations, each observation with two elements to
be ignored, the label and numeric variables. I'm using the following code
to create the data set descriptor, train the forest and test it.

hadoop jar $MAHOUT_HOME/core/target/mahout-core-0.9-SNAPSHOT-job.jar
org.apache.mahout.classifier.df.tools.Describe -p $INPUT_FILE -f
$DESCRIPTOR -d I I L 15 N
hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
org.apache.mahout.classifier.df.mapreduce.BuildForest
-Dmapred.max.split.size=1874231 -d $INPUT_FILE -ds $DESCRIPTOR -sl 5
-p -t 150 -o $MODEL_FOLDER
hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-SNAPSHOT-job.jar
org.apache.mahout.classifier.df.mapreduce.TestForest -i $INPUT_FILE
-ds $DESCRIPTOR -m $MODEL_FOLDER -a -mr -o $PREDICTIONS_FOLDER

The descriptor is created and the forest is trained. However, when I test
the generated model the execution fails.

Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.classifier.df.mapreduce.Classifier.run(Classifier.java:127)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.mapreduce(TestForest.java:188)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.testForest(TestForest.java:174)
 ...

The model has been generated and it's available in its folder. What seems
weird to me is that after the forest training I have this output:

13/10/10 11:18:17 INFO mapreduce.BuildForest: Forest num Nodes: 150
13/10/10 11:18:17 INFO mapreduce.BuildForest: Forest mean num Nodes: 1
13/10/10 11:18:17 INFO mapreduce.BuildForest: Forest mean max Depth: 1
13/10/10 11:18:17 INFO mapreduce.BuildForest: Storing the forest in:
kdd_forest/forest.seq

Are these numbers correct? A forest with depth 1? Am I missing something
important?

I would appreciate any comments!

Cheers,

Juan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message