mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanna <pdlaling...@gmail.com>
Subject Re: Mahout-0.7 BuildForest execution error
Date Thu, 10 Jul 2014 09:24:24 GMT

Lorenz Knies <lorenz.knies <at> metrigo.de> writes:

> 
> i think the oob option is gone
> 
> On Jan 21, 2013, at 1:52 PM, Stuti Awasthi <stutiawasthi <at> hcl.com> 
wrote:
> 
> > Hi,
> > 
> > I have downloaded Mahout and tried to execute Partial Implementation. 
When I try to run I am getting the
> parsing error:
> > 
> > $HADOOP_HOME/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-
0.7-job.jar
> org.apache.mahout.classifier.df.mapreduce.BuildForest -oob -d 
/testdata/KDDTrain+.arff -ds
> /testdata/KDDTrain+.info -sl 5 -p -t 100 -o /testdata/nsl-forest
> > 
> > 13/01/21 18:16:24 ERROR mapreduce.BuildForest: Exception
> > org.apache.commons.cli2.OptionException: Unexpected /testdata/nsl-forest 
while processing Options
> >        at 
org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
> >        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:1
39)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:
253)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57
)
> >        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:43)
> >        at java.lang.reflect.Method.invoke(Method.java:616)
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > Usage:
> > [--data <path> --dataset <dataset> --selection <m> --no-complete
--
minsplit
> > <minsplit> --minprop <minprop> --seed <seed> --partial --nbtrees

<nbtrees>
> > --output <path> --help]
> > Options
> >  --data (-d) path             Data path
> >  --dataset (-ds) dataset      Dataset path
> >  --selection (-sl) m          Optional, Number of variables to select 
randomly
> >                               at each tree-node.
> >                               For classification problem, the default is
> >                               square root of the number of explanatory
> >                               variables.
> >                               For regression problem, the default is 1/3 
of
> >                               the number of explanatory variables.
> >  --no-complete (-nc)          Optional, The tree is not complemented
> >  --minsplit (-ms) minsplit    Optional, The tree-node is not divided, if 
the
> >                               branching data size is smaller than this 
value.
> >                               The default is 2.
> >  --minprop (-mp) minprop      Optional, The tree-node is not divided, if 
the
> >                               proportion of the variance of branching 
data is
> >                               smaller than this value.
> >                               In the case of a regression problem, this 
value
> >                               is used. The default is 1/1000(0.001).
> >  --seed (-sd) seed            Optional, seed value used to initialise 
the
> >                               Random number generator
> >  --partial (-p)               Optional, use the Partial Data 
implementation
> >  --nbtrees (-t) nbtrees       Number of trees to grow
> >  --output (-o) path           Output path, will contain the Decision 
Forest
> >  --help (-h)                  Print out help
> > 
> > 
> > If I try to run with Mahout-0.5, its working fine and generating 
/testdata/nsl-forest/forest.seq in hdfs.
> > Is this a bug in Mahout-0.7 or am I doing something wrong.
> > 
> > Please Suggest
> > 
> > Thanks
> > Stuti Awasthi
> > 
> > 
> > 
> > 
> > 
> > ::DISCLAIMER::
> > ------------------------------------------------------------------------
----------------------------------------------------------------------------
> > 
> > The contents of this e-mail and any attachment(s) are confidential and 
intended for the named
> recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as 
information could be intercepted, corrupted,
> > lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
> > (with or without referred errors) shall therefore not attach any 
liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of 
the author and may not necessarily
> reflect the
> > views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying,
> disclosure, modification,
> > distribution and / or publication of this message without the prior 
written consent of authorized
> representative of
> > HCL is strictly prohibited. If you have received this email in error 
please delete it and notify the sender immediately.
> > Before opening any email and/or attachments, please check them for 
viruses and other defects.
> > 
> > ------------------------------------------------------------------------
----------------------------------------------------------------------------
> 
> 




So how can I calculate the oob error rate in mahout 0.9?
Does the code support it? If yes what arguments do I need to give/changes do 
I need to make?
Also which version of mahout supports -oob option?


Follwing is the trace of my execution of the example:

hduser@ubuntu:/home/prasanna/Downloads/mahout-distribution-0.9$ hadoop jar 
mahout-examples-0.9-job.jar 
org.apache.mahout.classifier.df.mapreduce.BuildForest -
Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds 
testdata/KDDTrain+.info -sl 5 -p -t 10 -o nsl-forest
Warning: $HADOOP_HOME is deprecated.

14/07/10 02:17:43 INFO mapreduce.BuildForest: Partial Mapred implementation
14/07/10 02:17:43 INFO mapreduce.BuildForest: Building the forest...
14/07/10 02:17:44 INFO input.FileInputFormat: Total input paths to process : 
1
14/07/10 02:17:44 INFO util.NativeCodeLoader: Loaded the native-hadoop 
library
14/07/10 02:17:44 WARN snappy.LoadSnappy: Snappy native library not loaded
14/07/10 02:17:44 INFO mapred.JobClient: Running job: job_201407092351_0003
14/07/10 02:17:45 INFO mapred.JobClient:  map 0% reduce 0%
14/07/10 02:18:07 INFO mapred.JobClient:  map 20% reduce 0%
14/07/10 02:18:19 INFO mapred.JobClient:  map 30% reduce 0%
14/07/10 02:18:22 INFO mapred.JobClient:  map 40% reduce 0%
14/07/10 02:18:31 INFO mapred.JobClient:  map 60% reduce 0%
14/07/10 02:18:43 INFO mapred.JobClient:  map 80% reduce 0%
14/07/10 02:18:55 INFO mapred.JobClient:  map 90% reduce 0%
14/07/10 02:18:58 INFO mapred.JobClient:  map 100% reduce 0%
14/07/10 02:19:03 INFO mapred.JobClient: Job complete: job_201407092351_0003
14/07/10 02:19:03 INFO mapred.JobClient: Counters: 20
14/07/10 02:19:03 INFO mapred.JobClient:   Job Counters 
14/07/10 02:19:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=129677
14/07/10 02:19:03 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
14/07/10 02:19:03 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
14/07/10 02:19:03 INFO mapred.JobClient:     Launched map tasks=10
14/07/10 02:19:03 INFO mapred.JobClient:     Data-local map tasks=10
14/07/10 02:19:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/07/10 02:19:03 INFO mapred.JobClient:   File Output Format Counters 
14/07/10 02:19:03 INFO mapred.JobClient:     Bytes Written=75424
14/07/10 02:19:03 INFO mapred.JobClient:   FileSystemCounters
14/07/10 02:19:03 INFO mapred.JobClient:     FILE_BYTES_READ=28270
14/07/10 02:19:03 INFO mapred.JobClient:     HDFS_BYTES_READ=18759170
14/07/10 02:19:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=227620
14/07/10 02:19:03 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=75424
14/07/10 02:19:03 INFO mapred.JobClient:   File Input Format Counters 
14/07/10 02:19:03 INFO mapred.JobClient:     Bytes Read=18757940
14/07/10 02:19:03 INFO mapred.JobClient:   Map-Reduce Framework
14/07/10 02:19:03 INFO mapred.JobClient:     Map input records=125973
14/07/10 02:19:03 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=775561216
14/07/10 02:19:03 INFO mapred.JobClient:     Spilled Records=0
14/07/10 02:19:03 INFO mapred.JobClient:     CPU time spent (ms)=20080
14/07/10 02:19:03 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=317194240
14/07/10 02:19:03 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=10654883840
14/07/10 02:19:03 INFO mapred.JobClient:     Map output records=10
14/07/10 02:19:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1230
14/07/10 02:19:04 INFO common.HadoopUtil: Deleting 
hdfs://localhost:54310/user/hduser/nsl-fores
14/07/10 02:19:04 INFO mapreduce.BuildForest: Build Time: 0h 1m 20s 892
14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest num Nodes: 4085
14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest mean num Nodes: 408
14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest mean max Depth: 12
14/07/10 02:19:04 INFO mapreduce.BuildForest: Storing the forest in: nsl-
fores/forest.seq
 
It gives the above statistics but not the oob error estimate as mentioned in 
the example here https://mahout.apache.org/users/classification/partial-
implementation.html
How can I get the oob?



Mime
View raw message