mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Jordan <>
Subject Re: Decision Forest/Partial Implementation TestForest Error
Date Fri, 07 Sep 2012 22:10:07 GMT
Any thoughts here?

On Thu, Sep 6, 2012 at 7:00 AM, Nick Jordan <> wrote:
> Same problem with the sequential classifier.  My guess is that this
> "corruption" is happening because of that particular setting as it is
> the only thing that I'm changing, but I have no idea how to
> investigate further.
> Nick
> On Thu, Sep 6, 2012 at 2:22 AM, Abdelhakim Deneche <> wrote:
>> Hi Nick,
>> This is not a memory problem, the classifier tries to load the trained forest but
it's getting some unexpected values. This problem never occured before! Could the forest files
be corrupted ?
>> Try training the forest once again, and this time use the sequential classifier (don't
use the -mr parameter) and see if the problem still occurs.
>> On 5 sept. 2012, at 23:00, Nick Jordan <> wrote:
>>> Hello All,
>>> I'm playing around with decision forests using the partial
>>> implementation and my own data set.  I am getting an error with
>>> TestForest, but only for certain forests that I'm building with
>>> BuildForest.  Using the same descriptor and same build and test data
>>> sets I get no error if I set mapred.max.split.size=1890528 which is
>>> roughly 1/100th the size of the build data set.  I can build the
>>> forest and test the remaining data and get the results with no
>>> problem.  When I change the split size to 18905280, everything still
>>> appears to work fine when building the forest, but when I then try to
>>> test the remaining data I get the error below.
>>> I've dug around the code a little, but nothing stood out as to why the
>>> array would go out of bounds at that specific value.  One solution is
>>> to obviously not create partitions that large, but if it was a problem
>>> with me running out of memory I would have expected an out of memory
>>> error and not an index past the size the bounds of an array.  I'd
>>> obviously prefer larger partitions and thus less of them and can move
>>> running this job to something like EMR which should allow me to have
>>> more memory, but I want to understand the nature of the error.
>>> For what it is worth I'm running this on hadoop-1.0.3 and mahout-0.8-SNAPSHOT
>>> Thanks.
>>> --
>>> 12/09/05 17:52:09 INFO mapred.JobClient: Task Id :
>>> attempt_201209031756_0008_m_000000_0, Status : FAILED
>>> java.lang.ArrayIndexOutOfBoundsException: 946827879
>>>        at
>>>        at org.apache.mahout.classifier.df.DecisionForest.readFields(
>>>        at
>>>        at org.apache.mahout.classifier.df.DecisionForest.load(
>>>        at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(
>>>        at
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(
>>>        at
>>>        at org.apache.hadoop.mapred.Child$
>>>        at Method)
>>>        at
>>>        at
>>>        at org.apache.hadoop.mapred.Child.main(

View raw message