I first saw this using SparkSQL but the result is the same with plain Spark. 

14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)

Full stack below ....

I tried many different thing without luck 
    * extract the libsnappyjava.so from the Spark assembly and put it on the library path 
           * Added -Djava.library.path=... to  SPARK_MASTER_OPTS and SPARK_WORKER_OPTS
           * added library path to SPARK_LIBRARY_PATH
           * added hadoop library path to SPARK_LIBRARY_PATH
    * Rebuilt spark with different versions (previous and next)  of Snappy (as seen when Google-ing) 


Env :
   Centos 6.4
   Hadoop 2.3 (CDH5.1)
   Running in standalone/local mode 


Any help would be appreciated 

Thank you 

Stephane 


scala> import org.apache.hadoop.io.BytesWritable
import org.apache.hadoop.io.BytesWritable

scala> import org.apache.hadoop.io.Text
import org.apache.hadoop.io.Text

scala> import org.apache.hadoop.io.NullWritable
import org.apache.hadoop.io.NullWritable

scala> var seq = sc.sequenceFile[NullWritable,Text]("/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0").map(_._2.toString())
14/11/07 19:46:19 INFO MemoryStore: ensureFreeSpace(157973) called with curMem=0, maxMem=278302556
14/11/07 19:46:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 154.3 KB, free 265.3 MB)
seq: org.apache.spark.rdd.RDD[String] = MappedRDD[2] at map at <console>:15

scala> seq.collect().foreach(println)
14/11/07 19:46:35 INFO FileInputFormat: Total input paths to process : 1
14/11/07 19:46:35 INFO SparkContext: Starting job: collect at <console>:18
14/11/07 19:46:35 INFO DAGScheduler: Got job 0 (collect at <console>:18) with 2 output partitions (allowLocal=false)
14/11/07 19:46:35 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:18)
14/11/07 19:46:35 INFO DAGScheduler: Parents of final stage: List()
14/11/07 19:46:35 INFO DAGScheduler: Missing parents: List()
14/11/07 19:46:35 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at <console>:15), which has no missing parents
14/11/07 19:46:35 INFO MemoryStore: ensureFreeSpace(2928) called with curMem=157973, maxMem=278302556
14/11/07 19:46:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.9 KB, free 265.3 MB)
14/11/07 19:46:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[2] at map at <console>:15)
14/11/07 19:46:36 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/11/07 19:46:36 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1243 bytes)
14/11/07 19:46:36 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1243 bytes)
14/11/07 19:46:36 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/11/07 19:46:36 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
14/11/07 19:46:36 INFO HadoopRDD: Input split: file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0:6504064+6504065
14/11/07 19:46:36 INFO HadoopRDD: Input split: file:/home/lfs/warehouse/base.db/mytable/event_date=2014-06-01/000000_0:0+6504064
14/11/07 19:46:36 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/11/07 19:46:36 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/11/07 19:46:36 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/11/07 19:46:36 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/11/07 19:46:36 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/11/07 19:46:36 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/11/07 19:46:36 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/11/07 19:46:36 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/11/07 19:46:36 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/11/07 19:46:36 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
        org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
        org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
        org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:190)
        org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176)
        org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1915)
        org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
        org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
        org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
        org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
        org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
        org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:197)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:188)
        org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:97)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
14/11/07 19:46:36 ERROR TaskSetManager: Task 1 in stage 0.0 failed 1 times; aborting job
14/11/07 19:46:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
14/11/07 19:46:36 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor localhost: java.lang.UnsatisfiedLinkError (org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z) [duplicate 1]
14/11/07 19:46:36 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool