mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Jordan <>
Subject Decision Forest/Partial Implementation TestForest Error
Date Wed, 05 Sep 2012 22:00:24 GMT
Hello All,

I'm playing around with decision forests using the partial
implementation and my own data set.  I am getting an error with
TestForest, but only for certain forests that I'm building with
BuildForest.  Using the same descriptor and same build and test data
sets I get no error if I set mapred.max.split.size=1890528 which is
roughly 1/100th the size of the build data set.  I can build the
forest and test the remaining data and get the results with no
problem.  When I change the split size to 18905280, everything still
appears to work fine when building the forest, but when I then try to
test the remaining data I get the error below.

I've dug around the code a little, but nothing stood out as to why the
array would go out of bounds at that specific value.  One solution is
to obviously not create partitions that large, but if it was a problem
with me running out of memory I would have expected an out of memory
error and not an index past the size the bounds of an array.  I'd
obviously prefer larger partitions and thus less of them and can move
running this job to something like EMR which should allow me to have
more memory, but I want to understand the nature of the error.

For what it is worth I'm running this on hadoop-1.0.3 and mahout-0.8-SNAPSHOT



12/09/05 17:52:09 INFO mapred.JobClient: Task Id :
attempt_201209031756_0008_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 946827879
        at org.apache.mahout.classifier.df.DecisionForest.readFields(
        at org.apache.mahout.classifier.df.DecisionForest.load(
        at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(
        at org.apache.hadoop.mapred.MapTask.runNewMapper(
        at org.apache.hadoop.mapred.Child$
        at Method)
        at org.apache.hadoop.mapred.Child.main(

View raw message