mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Schein <andrew.sch...@efrontier.com>
Subject Re: java.io.IOException while running itemsimilarity
Date Fri, 24 Jun 2011 19:15:51 GMT
Hi Sean -

How can you determine that the file size is an issue?  I don't see any 
memory-related exception in the stack trace.

Thanks,

Andy

Sean Owen wrote:
> This is a Hadoop issue, not a Mahout issue.
>
> In general it means Hadoop is choking on files that are too large. Use more
> mappers and/or reducers.
>
> On Thu, Jun 23, 2011 at 6:35 PM, Andrew Schein
> <andrew.schein@efrontier.com>wrote:
>
>   
>> Hi all -
>>
>> I am getting the following exception while running an itemsimilarity job:
>>
>> java.io.IOException: Task: attempt_201106201353_0017_r_**000000_0 - The
>> reduce copier failed
>>       at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:388)
>>       at org.apache.hadoop.mapred.**Child$4.run(Child.java:259)
>>       at java.security.**AccessController.doPrivileged(**Native Method)
>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1059)
>>       at org.apache.hadoop.mapred.**Child.main(Child.java:253)
>> Caused by: java.io.IOException: java.lang.RuntimeException:
>> java.io.EOFException
>>       at org.apache.hadoop.io.**WritableComparator.compare(**WritableComparator.java:103)
>>
>>       at org.apache.hadoop.mapred.**Merger$MergeQueue.lessThan(**
>> Merger.java:373)
>>       at org.apache.hadoop.util.**PriorityQueue.downHeap(**
>> PriorityQueue.java:136)
>>       at org.apache.hadoop.util.**PriorityQueue.adjustTop(**
>> PriorityQueue.java:103)
>>       at org.apache.hadoop.mapred.**Merger$MergeQueue.**
>> adjustPriorityQueue(Merger.**java:335)
>>       at org.apache.hadoop.mapred.**Merger$MergeQueue.next(Merger.**
>> java:350)
>>       at org.apache.hadoop.mapred.**Merger.writeFile(Merger.java:**156)
>>       at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
>> LocalFSMerger.run(ReduceTask.**java:2669)
>> Caused by: java.io.EOFException
>>       at java.io.DataInputStream.**readByte(DataInputStream.java:**250)
>>       at org.apache.mahout.math.Varint.**readUnsignedVarInt(Varint.**
>> java:159)
>>       at org.apache.mahout.math.Varint.**readSignedVarInt(Varint.java:**
>> 140)
>>       at org.apache.mahout.math.hadoop.**similarity.**
>> SimilarityMatrixEntryKey.**readFields(**SimilarityMatrixEntryKey.java:**64)
>>
>>       at org.apache.hadoop.io.**WritableComparator.compare(**
>> WritableComparator.java:97)
>>       ... 7 more
>>
>>       at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
>> LocalFSMerger.run(ReduceTask.**java:2673)
>>
>> The exception only occurs for large data sets >= 9 gigs making it difficult
>> to diagnose.
>>
>> I am using mahout-distribution-0.4 (0.5 gave me other issues) with
>> hadoop-0.20.203.0.
>>
>> Has anyone else encountered this problem?
>>
>> Thanks,
>>
>> Andrew
>>
>>     


Mime
View raw message