mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marty Kube <marty.kube.apa...@gmail.com>
Subject Re: Partial Implementation of Random Forests
Date Thu, 28 Feb 2013 22:24:21 GMT
Hi Sara,
On the surface your change looks okay to me.  But it's hard say really.
It looks like the code expected to read more data.  Perhaps you add some 
logging around the statements that failed and try to get a sense of how 
much and what data had been successfully read just prior to the failure.
Did you change anything else?  Maybe you could post the diffs.
Marty


On 02/28/2013 04:06 PM, Sara Del Río García wrote:
> Hello all:
>
> I'm testing the Random Forest Partial version in the version of 
> Hadoop: Hadoop 2.0.0-cdh4.1.1
>
> I'm trying to modify the algorithm, all I do is add more information 
> to the leaves of the tree. Currently containing the label and I want 
> to add another label more:
>
> @Override
> public void readFields(DataInput in) throws IOException{
>
> label = in.readDouble();
> leafWeight = in.readDouble();
>
> }
>
> @Override
> protected void writeNode(DataOutput out) throws IOException{
>
> out.writeDouble(label);
> out.writeDouble(leafWeight);
>
> }
>
> And I get the following error:
>
> 13/02/27 06:53:27 INFO mapreduce.BuildForest: Partial Mapred 
> implementation
> 13/02/27 06:53:27 INFO mapreduce.BuildForest: Building the forest...
> 13/02/27 06:53:27 INFO mapreduce.BuildForest: Weights Estimation: IR
> 13/02/27 06:53:37 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
> 13/02/27 06:53:39 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 13/02/27 06:53:39 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes 
> where applicable
> 13/02/27 06:53:39 WARN snappy.LoadSnappy: Snappy native library not 
> loaded
> 13/02/27 06:53:39 INFO mapred.JobClient: Running job: 
> job_201302270205_0013
> 13/02/27 06:53:40 INFO mapred.JobClient: map 0% reduce 0%
> 13/02/27 06:54:18 INFO mapred.JobClient: map 20% reduce 0%
> 13/02/27 06:54:42 INFO mapred.JobClient: map 40% reduce 0%
> 13/02/27 06:55:03 INFO mapred.JobClient: map 60% reduce 0%
> 13/02/27 06:55:26 INFO mapred.JobClient: map 70% reduce 0%
> 13/02/27 06:55:27 INFO mapred.JobClient: map 80% reduce 0%
> 13/02/27 06:55:49 INFO mapred.JobClient: map 100% reduce 0%
> 13/02/27 06:56:04 INFO mapred.JobClient: Job complete: 
> job_201302270205_0013
> 13/02/27 06:56:04 INFO mapred.JobClient: Counters: 24
> 13/02/27 06:56:04 INFO mapred.JobClient: File System Counters
> 13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of bytes read=0
> 13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of bytes 
> written=1828230
> 13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of read 
> operations=0
> 13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of large read 
> operations=0
> 13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of write 
> operations=0
> 13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes 
> read=1381649
> 13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes 
> written=1680
> 13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of read 
> operations=30
> 13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of large read 
> operations=0
> 13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of write 
> operations=10
> 13/02/27 06:56:04 INFO mapred.JobClient: Job Counters
> 13/02/27 06:56:04 INFO mapred.JobClient: Launched map tasks=10
> 13/02/27 06:56:04 INFO mapred.JobClient: Data-local map tasks=10
> 13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all maps 
> in occupied slots (ms)=254707
> 13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all 
> reduces in occupied slots (ms)=0
> 13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all 
> reduces waiting after reserving slots (ms)=0
> 13/02/27 06:56:04 INFO mapred.JobClient: Map-Reduce Framework
> 13/02/27 06:56:04 INFO mapred.JobClient: Map input records=20
> 13/02/27 06:56:04 INFO mapred.JobClient: Map output records=10
> 13/02/27 06:56:04 INFO mapred.JobClient: Input split bytes=1540
> 13/02/27 06:56:04 INFO mapred.JobClient: Spilled Records=0
> 13/02/27 06:56:04 INFO mapred.JobClient: CPU time spent (ms)=12070
> 13/02/27 06:56:04 INFO mapred.JobClient: Physical memory (bytes) 
> snapshot=949579776
> 13/02/27 06:56:04 INFO mapred.JobClient: Virtual memory (bytes) 
> snapshot=8412340224
> 13/02/27 06:56:04 INFO mapred.JobClient: Total committed heap usage 
> (bytes)=478412800
> Exception in thread "main" java.lang.IllegalStateException: 
> java.io.EOFException
> at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:104)
> at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:129)
> at 
> org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:96)
> at 
> org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:312)
> at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:246)
> at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:200)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:270)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at java.io.DataInputStream.readLong(DataInputStream.java:399)
> at java.io.DataInputStream.readDouble(DataInputStream.java:451)
> at org.apache.mahout.classifier.df.node.Leaf.readFields(Leaf.java:136)
> at org.apache.mahout.classifier.df.node.Node.read(Node.java:85)
> at 
> org.apache.mahout.classifier.df.mapreduce.MapredOutput.readFields(MapredOutput.java:64)
> at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2114)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2242)
> at 
> org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95)
> ... 10 more
>
> What's the problem?
>
> You can try to write more information in the leaves of the tree?
>
> Thank you very much.
>
>
> Best regards,
>
> Sara
>
>


Mime
View raw message