spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurvinder Singh <>
Subject reading compress lzo files
Date Thu, 03 Jul 2014 16:24:27 GMT
Hi all,

I am trying to read the lzo files. It seems spark recognizes that the
input file is compressed and got the decompressor as

14/07/03 18:11:01 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
14/07/03 18:11:01 INFO lzo.LzoCodec: Successfully loaded & initialized
native-lzo library [hadoop-lzo rev
14/07/03 18:11:01 INFO Configuration.deprecation: hadoop.native.lib is
deprecated. Instead, use io.native.lib.available
14/07/03 18:11:01 INFO compress.CodecPool: Got brand-new decompressor

But it has two issues

1. It just stuck here without doing anything waited for 15 min for a
small files.
2. I used the hadoop-lzo to create the index so that spark can split
the input to multiple maps but spark creates only one mapper.

I am using python with reading using sc.textFile(). Spark version is
of the git master.


View raw message