spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From innowireless TaeYun Kim <>
Subject How to read a snappy-compressed text file?
Date Thu, 12 Jun 2014 10:20:29 GMT


Maybe this is a newbie question: How to read a snappy-compressed text file?


The OS is Windows 7.

Currently, I've done the following steps:


1. Built Hadoop 2.4.0 with snappy option.

'hadoop checknative' command displays the following line:

snappy: true D:\hadoop-2.4.0\bin\snappy.dll

So, I assume hadoop can do snappy compression.

BTW, snapp.dll was copied from snapp64.dll file in snappy-windows-


2. Added the following configurations to both core-site.xml and






3. Added the following environment variable.


Since I use IntelliJ, the above line was included to the Environment
variables section in Run Configuration.


4. Compressed the input text file with snzip.exe which was included in


4. Wrote the code.

sc.textFile(compressed_file_name);  // no other argument.


Now when I run my spark application, the results are as follows:


1. 'snappy' string cannot be found in DEBUG log.

The most relevant logs are as follows:

14/06/12 18:57:55 DEBUG NativeCodeLoader: Trying to load the custom-built
native-hadoop library...

14/06/12 18:57:55 DEBUG NativeCodeLoader: Loaded the native-hadoop library

2. Application fails. The log is as follows:

14/06/12 18:57:57 WARN: int from string failed for: [(some binary


So apparently sc.textFile() does not recognize the file format and read it
as-is, so map function receives a garbage.


How can I fix this?




View raw message