spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: count()-ing gz files gives java.io.IOException: incorrect header check
Date Wed, 21 May 2014 01:53:07 GMT
Any tips on how to troubleshoot this?


On Thu, May 15, 2014 at 4:15 PM, Nick Chammas <nicholas.chammas@gmail.com>wrote:

> I’m trying to do a simple count() on a large number of GZipped files in
> S3. My job is failing with the following message:
>
> 14/05/15 19:12:37 WARN scheduler.TaskSetManager: Loss was due to java.io.IOException
> java.io.IOException: incorrect header check
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
>     at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
>     at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>     at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
>     at java.io.InputStream.read(InputStream.java:101)
>
> <snipped>
>
> I traced this down to HADOOP-5281<https://issues.apache.org/jira/browse/HADOOP-5281>,
> but I’m not sure if 1) it’s the same issue, or 2) how to go about resolving
> it.
>
> I gather I need to update some Hadoop jar? Any tips on where to look/what
> to do?
>
> I’m running Spark on an EC2 cluster created by spark-ec2 with no special
> options used.
>
> Nick
>
> ------------------------------
> View this message in context: count()-ing gz files gives
> java.io.IOException: incorrect header check<http://apache-spark-user-list.1001560.n3.nabble.com/count-ing-gz-files-gives-java-io-IOException-incorrect-header-check-tp5768.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at
Nabble.com.
>

Mime
View raw message