spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Teoh <chris.t...@gmail.com>
Subject Re: No auto decompress in Spark Java textFile function?
Date Wed, 09 Sep 2015 13:44:20 GMT
Thanks. What I noticed was the decompress works if the file is in HDFS but
not when it is a local file when working in a development environment.

Does anyone else have the same problem?
On Wed, 9 Sep 2015 at 4:40 pm Akhil Das <akhil@sigmoidanalytics.com> wrote:

> textFile used to work with .gz files, i haven't tested it on bz2 files. If
> it isn't decompressing by default then what you have to do is to use the
> sc.wholeTextFiles and then decompress each record (that being file) with
> the corresponding codec.
>
> Thanks
> Best Regards
>
> On Tue, Sep 8, 2015 at 6:49 PM, Chris Teoh <chris.teoh@gmail.com> wrote:
>
>> Hi Folks,
>>
>> I tried using Spark v1.2 on bz2 files in Java but the behaviour is
>> different to the same textFile API call in Python and Scala.
>>
>> That being said, how do I process to read .tar.bz2 files in Spark's Java
>> API?
>>
>> Thanks in advance
>> Chris
>>
>
>

Mime
View raw message