mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Baron <>
Subject Run more than one mapper for TestForest?
Date Fri, 05 Jul 2013 21:00:30 GMT
I'm attempting to run org.apache.mahout.classifier.df.mapreduce.TestForest
on a CSV with 200,000 rows that have 500,000 features per row.
 However, TestForest is  running extremely slow, likely because only 1
mapper was assigned to the job.  This seems strange because
the org.apache.mahout.classifier.df.mapreduce.BuildForest step on the same
data used 1772 mappers and took about 6 minutes.  (BTW: I know I
*shouldn't* use the same data set for the training and the testing steps;
this is purely a technical experiment to see if Mahout's Random Forest can
handle the data sizes we typically deal with).

Any idea on how to get org.apache.mahout.classifier.df.mapreduce.TestForest
to use more mappers?  Glancing at the code (and thinking about what is
happening intuitively), it should be ripe for parallelization.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message