mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdelhakim Deneche <adene...@gmail.com>
Subject Re: Decision Forest/Partial Implementation TestForest Error
Date Thu, 06 Sep 2012 06:22:46 GMT
Hi Nick,

This is not a memory problem, the classifier tries to load the trained forest but it's getting
some unexpected values. This problem never occured before! Could the forest files be corrupted
?

Try training the forest once again, and this time use the sequential classifier (don't use
the -mr parameter) and see if the problem still occurs.


On 5 sept. 2012, at 23:00, Nick Jordan <nick@influen.se> wrote:

> Hello All,
> 
> I'm playing around with decision forests using the partial
> implementation and my own data set.  I am getting an error with
> TestForest, but only for certain forests that I'm building with
> BuildForest.  Using the same descriptor and same build and test data
> sets I get no error if I set mapred.max.split.size=1890528 which is
> roughly 1/100th the size of the build data set.  I can build the
> forest and test the remaining data and get the results with no
> problem.  When I change the split size to 18905280, everything still
> appears to work fine when building the forest, but when I then try to
> test the remaining data I get the error below.
> 
> I've dug around the code a little, but nothing stood out as to why the
> array would go out of bounds at that specific value.  One solution is
> to obviously not create partitions that large, but if it was a problem
> with me running out of memory I would have expected an out of memory
> error and not an index past the size the bounds of an array.  I'd
> obviously prefer larger partitions and thus less of them and can move
> running this job to something like EMR which should allow me to have
> more memory, but I want to understand the nature of the error.
> 
> For what it is worth I'm running this on hadoop-1.0.3 and mahout-0.8-SNAPSHOT
> 
> Thanks.
> 
> --
> 
> 12/09/05 17:52:09 INFO mapred.JobClient: Task Id :
> attempt_201209031756_0008_m_000000_0, Status : FAILED
> java.lang.ArrayIndexOutOfBoundsException: 946827879
>        at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
>        at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
>        at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
>        at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
>        at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:212)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:416)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Mime
View raw message