spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rich Haase <rha...@pandora.com>
Subject Native library error when trying to use Spark with Snappy files
Date Thu, 11 Dec 2014 21:53:02 GMT
I am running a Hadoop cluster with Spark on YARN.  The cluster running the CDH5.2 distribution.
 When I try to run spark jobs against snappy compressed files I receive the following error.

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:110)
        org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:198)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:189)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:98)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)

I have tried to set  JAVA_LIBRARY_PATH, LD_LIBRARY_PATH, spark.executor.extraLibraryPath,
spark.executor.extraClassPath and more with absolutely no luck.

Additionally, I have confirmed that I can run map reduce jobs against snappy files without
any problem and hadoop checknative looks good:

$ hadoop checknative -a
14/12/11 13:51:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2
library system-native
14/12/11 13:51:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib
library
Native library checking:
hadoop:  true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
snappy:  true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0

Can anyone give me any suggestions as to why this would not be working or better yet, how
I can fix this problem?

Thanks!!!

Rich Haase | Sr. Software Engineer | Pandora
m 303.887.1146 | rhaase@pandora.com

Mime
View raw message