spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Working on LZOP Files
Date Fri, 26 Sep 2014 03:25:02 GMT
Hi Harsha,

I use LZOP files extensively on my Spark cluster -- see my writeup for how
to do this on this mailing list post:
http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCAOoZ679ehwvT1g8=qHd2n11Z4EXOBJkP+q=Aj0qE_=sHHYLBaA@mail.gmail.com%3E

Maybe we should better document how to use LZO with Spark because it can be
tricky to get the lzo jars, native libraries, and hadoopFile() calls all
set up correctly.

Andrew

On Thu, Sep 25, 2014 at 9:44 AM, Harsha HN <99harsha.h.n99@gmail.com> wrote:

> Hi,
>
> Anybody using LZOP files to process in Spark?
>
> We have a huge volume of LZOP files in HDFS to process through Spark. In
> MapReduce framework, it automatically detects the file format and sends the
> decompressed version to Mappers.
> Any such support in Spark?
> As of now I am manually downloading, decompressing it before processing.
>
> Thanks,
> Harsha
>

Mime
View raw message