spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: count()-ing gz files gives java.io.IOException: incorrect header check
Date Wed, 21 May 2014 02:25:43 GMT
Yes, it does work with fewer GZipped files. I am reading the files in using
sc.textFile() and a pattern string.

For example:

a = sc.textFile('s3n://bucket/2014-??-??/*.gz')
a.count()

Nick
‚Äč


On Tue, May 20, 2014 at 10:09 PM, Madhu <madhu@madhu.com> wrote:

> I have read gzip files from S3 successfully.
>
> It sounds like a file is corrupt or not a valid gzip file.
>
> Does it work with fewer gzip files?
> How are you reading the files?
>
>
>
>
> -----
> Madhu
> https://www.linkedin.com/in/msiddalingaiah
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/count-ing-gz-files-gives-java-io-IOException-incorrect-header-check-tp5768p6149.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message