mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: java.io.IOException while running itemsimilarity
Date Thu, 23 Jun 2011 18:22:55 GMT
This is a Hadoop issue, not a Mahout issue.

In general it means Hadoop is choking on files that are too large. Use more
mappers and/or reducers.

On Thu, Jun 23, 2011 at 6:35 PM, Andrew Schein
<andrew.schein@efrontier.com>wrote:

> Hi all -
>
> I am getting the following exception while running an itemsimilarity job:
>
> java.io.IOException: Task: attempt_201106201353_0017_r_**000000_0 - The
> reduce copier failed
>       at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:388)
>       at org.apache.hadoop.mapred.**Child$4.run(Child.java:259)
>       at java.security.**AccessController.doPrivileged(**Native Method)
>       at javax.security.auth.Subject.**doAs(Subject.java:396)
>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1059)
>       at org.apache.hadoop.mapred.**Child.main(Child.java:253)
> Caused by: java.io.IOException: java.lang.RuntimeException:
> java.io.EOFException
>       at org.apache.hadoop.io.**WritableComparator.compare(**WritableComparator.java:103)
>
>       at org.apache.hadoop.mapred.**Merger$MergeQueue.lessThan(**
> Merger.java:373)
>       at org.apache.hadoop.util.**PriorityQueue.downHeap(**
> PriorityQueue.java:136)
>       at org.apache.hadoop.util.**PriorityQueue.adjustTop(**
> PriorityQueue.java:103)
>       at org.apache.hadoop.mapred.**Merger$MergeQueue.**
> adjustPriorityQueue(Merger.**java:335)
>       at org.apache.hadoop.mapred.**Merger$MergeQueue.next(Merger.**
> java:350)
>       at org.apache.hadoop.mapred.**Merger.writeFile(Merger.java:**156)
>       at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
> LocalFSMerger.run(ReduceTask.**java:2669)
> Caused by: java.io.EOFException
>       at java.io.DataInputStream.**readByte(DataInputStream.java:**250)
>       at org.apache.mahout.math.Varint.**readUnsignedVarInt(Varint.**
> java:159)
>       at org.apache.mahout.math.Varint.**readSignedVarInt(Varint.java:**
> 140)
>       at org.apache.mahout.math.hadoop.**similarity.**
> SimilarityMatrixEntryKey.**readFields(**SimilarityMatrixEntryKey.java:**64)
>
>       at org.apache.hadoop.io.**WritableComparator.compare(**
> WritableComparator.java:97)
>       ... 7 more
>
>       at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
> LocalFSMerger.run(ReduceTask.**java:2673)
>
> The exception only occurs for large data sets >= 9 gigs making it difficult
> to diagnose.
>
> I am using mahout-distribution-0.4 (0.5 gave me other issues) with
> hadoop-0.20.203.0.
>
> Has anyone else encountered this problem?
>
> Thanks,
>
> Andrew
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message