spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arthur.hk.chan@gmail.com" <arthur.hk.c...@gmail.com>
Subject Re: Spark Hive Snappy Error
Date Wed, 22 Oct 2014 23:15:49 GMT
Hi,

I tried some more tests, can reproduce the issue:

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3d4561f0

scala> hiveContext.hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
2014-10-23 07:07:35,414 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
2014-10-23 07:07:35,415 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:07:35,436 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-10-23 07:07:35,437 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=Driver.run>
2014-10-23 07:07:35,437 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=TimeToSubmit>
2014-10-23 07:07:35,437 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=compile>
2014-10-23 07:07:35,437 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=parse>
2014-10-23 07:07:35,437 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
2014-10-23 07:07:35,437 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:07:35,438 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=parse start=1414019255437 end=1414019255438 duration=1>
2014-10-23 07:07:35,438 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=semanticAnalyze>
2014-10-23 07:07:35,438 INFO  [main] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis
2014-10-23 07:07:35,438 INFO  [main] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeCreateTable(8868)) - Creating table src position=27
2014-10-23 07:07:35,438 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:35,439 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:35,448 INFO  [main] ql.Driver (Driver.java:compile(450)) - Semantic Analysis Completed
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=semanticAnalyze start=1414019255438 end=1414019255449 duration=11>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (Driver.java:getSchema(264)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=compile start=1414019255437 end=1414019255449 duration=12>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=Driver.execute>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (Driver.java:execute(1117)) - Starting command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=TimeToSubmit start=1414019255437 end=1414019255449 duration=12>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=runTasks>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=runTasks start=1414019255449 end=1414019255449 duration=0>
2014-10-23 07:07:35,449 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=Driver.execute start=1414019255449 end=1414019255449 duration=0>
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (SessionState.java:printInfo(410)) - OK
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=releaseLocks>
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=releaseLocks start=1414019255450 end=1414019255450 duration=0>
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=Driver.run start=1414019255437 end=1414019255450 duration=13>
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=releaseLocks>
2014-10-23 07:07:35,450 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=releaseLocks start=1414019255450 end=1414019255450 duration=0>
res14: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[28] at RDD at SchemaRDD.scala:103
== Query Plan ==
<Native command: executed by Hive>

scala> hiveContext.hql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
2014-10-23 07:07:43,080 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
2014-10-23 07:07:43,081 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:07:43,104 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-10-23 07:07:43,104 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=Driver.run>
2014-10-23 07:07:43,104 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=TimeToSubmit>
2014-10-23 07:07:43,104 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=compile>
2014-10-23 07:07:43,104 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=parse>
2014-10-23 07:07:43,104 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
2014-10-23 07:07:43,105 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:07:43,105 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=parse start=1414019263104 end=1414019263105 duration=1>
2014-10-23 07:07:43,105 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=semanticAnalyze>
2014-10-23 07:07:43,105 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,105 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,204 INFO  [main] ql.Driver (Driver.java:compile(450)) - Semantic Analysis Completed
2014-10-23 07:07:43,204 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=semanticAnalyze start=1414019263105 end=1414019263204 duration=99>
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (Driver.java:getSchema(264)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=compile start=1414019263104 end=1414019263205 duration=101>
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=Driver.execute>
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (Driver.java:execute(1117)) - Starting command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=TimeToSubmit start=1414019263104 end=1414019263205 duration=101>
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=runTasks>
2014-10-23 07:07:43,205 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=task.COPY.Stage-0>
2014-10-23 07:07:43,206 INFO  [main] exec.Task (SessionState.java:printInfo(410)) - Copying data from file:/edh/hadoop/spark-1.1.0_patched/examples/src/main/resources/kv1.txt to hdfs://edhcluster/tmp/hive-edhuser/hive_2014-10-23_07-07-43_104_4474250088547750999-1/-ext-10000
2014-10-23 07:07:43,236 INFO  [main] exec.Task (SessionState.java:printInfo(410)) - Copying file: file:/edh/hadoop/spark-1.1.0_patched/examples/src/main/resources/kv1.txt
2014-10-23 07:07:43,347 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=task.COPY.Stage-0 start=1414019263205 end=1414019263347 duration=142>
2014-10-23 07:07:43,348 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=task.MOVE.Stage-1>
2014-10-23 07:07:43,349 INFO  [main] exec.Task (SessionState.java:printInfo(410)) - Loading data to table default.src from hdfs://edhcluster/tmp/hive-edhuser/hive_2014-10-23_07-07-43_104_4474250088547750999-1/-ext-10000
2014-10-23 07:07:43,350 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,351 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,382 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,382 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,459 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: alter_table: db=default tbl=src newtbl=src
2014-10-23 07:07:43,459 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=alter_table: db=default tbl=src newtbl=src	
2014-10-23 07:07:43,460 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,461 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,482 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=task.MOVE.Stage-1 start=1414019263348 end=1414019263482 duration=134>
2014-10-23 07:07:43,482 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=task.STATS.Stage-2>
2014-10-23 07:07:43,482 INFO  [main] exec.StatsTask (StatsTask.java:execute(224)) - Executing stats task
2014-10-23 07:07:43,482 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,483 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,490 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,490 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,500 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: alter_table: db=default tbl=src newtbl=src
2014-10-23 07:07:43,500 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=alter_table: db=default tbl=src newtbl=src	
2014-10-23 07:07:43,500 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:07:43,501 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:07:43,522 INFO  [main] exec.Task (SessionState.java:printInfo(410)) - Table default.src stats: [num_partitions: 0, num_files: 3, num_rows: 0, total_size: 17436, raw_data_size: 0]
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=task.STATS.Stage-2 start=1414019263482 end=1414019263522 duration=40>
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=runTasks start=1414019263205 end=1414019263522 duration=317>
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=Driver.execute start=1414019263205 end=1414019263522 duration=317>
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (SessionState.java:printInfo(410)) - OK
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=releaseLocks>
2014-10-23 07:07:43,522 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=releaseLocks start=1414019263522 end=1414019263522 duration=0>
2014-10-23 07:07:43,523 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=Driver.run start=1414019263104 end=1414019263523 duration=419>
2014-10-23 07:07:43,523 INFO  [main] ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=releaseLocks>
2014-10-23 07:07:43,523 INFO  [main] ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=releaseLocks start=1414019263523 end=1414019263523 duration=0>
res15: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[31] at RDD at SchemaRDD.scala:103
== Query Plan ==
<Native command: executed by Hive>

scala> hiveContext.hql("select count(1) from src")
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
2014-10-23 07:08:00,879 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select count(1) from src
2014-10-23 07:08:00,880 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:08:00,882 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:08:00,883 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:08:00,919 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2014-10-23 07:08:00,943 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(389262) called with curMem=458999, maxMem=280248975
2014-10-23 07:08:00,943 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_10 stored as values in memory (estimated size 380.1 KB, free 266.5 MB)
res16: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[33] at RDD at SchemaRDD.scala:103
== Query Plan ==
== Physical Plan ==
Aggregate false, [], [SUM(PartialCount#15L) AS c_0#10L]
 Exchange SinglePartition
  Aggregate true, [], [COUNT(1) AS PartialCount#15L]
   HiveTableScan [], (MetastoreRelation default, src, None), None

scala> hiveContext.hql("select count(1) from src").collect().foreach(println)
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
2014-10-23 07:08:14,885 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select count(1) from src
2014-10-23 07:08:14,885 INFO  [main] parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed
2014-10-23 07:08:14,886 INFO  [main] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 0: get_table : db=default tbl=src
2014-10-23 07:08:14,886 INFO  [main] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=edhuser	ip=unknown-ip-addr	cmd=get_table : db=default tbl=src	
2014-10-23 07:08:14,915 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(394669) called with curMem=848261, maxMem=280248975
2014-10-23 07:08:14,916 INFO  [main] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_11 stored as values in memory (estimated size 385.4 KB, free 266.1 MB)
2014-10-23 07:08:14,987 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Starting job: collect at SparkPlan.scala:85
2014-10-23 07:08:15,031 INFO  [sparkDriver-akka.actor.default-dispatcher-2] mapred.FileInputFormat (FileInputFormat.java:listStatus(247)) - Total input paths to process : 3
2014-10-23 07:08:15,033 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Registering RDD 39 (mapPartitions at Exchange.scala:86)
2014-10-23 07:08:15,033 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Got job 6 (collect at SparkPlan.scala:85) with 1 output partitions (allowLocal=false)
2014-10-23 07:08:15,033 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Final stage: Stage 6(collect at SparkPlan.scala:85)
2014-10-23 07:08:15,033 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Parents of final stage: List(Stage 7)
2014-10-23 07:08:15,035 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Missing parents: List(Stage 7)
2014-10-23 07:08:15,037 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting Stage 7 (MapPartitionsRDD[39] at mapPartitions at Exchange.scala:86), which has no missing parents
2014-10-23 07:08:15,046 INFO  [sparkDriver-akka.actor.default-dispatcher-2] storage.MemoryStore (Logging.scala:logInfo(59)) - ensureFreeSpace(11528) called with curMem=1242930, maxMem=280248975
2014-10-23 07:08:15,046 INFO  [sparkDriver-akka.actor.default-dispatcher-2] storage.MemoryStore (Logging.scala:logInfo(59)) - Block broadcast_12 stored as values in memory (estimated size 11.3 KB, free 266.1 MB)
2014-10-23 07:08:15,048 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Submitting 3 missing tasks from Stage 7 (MapPartitionsRDD[39] at mapPartitions at Exchange.scala:86)
2014-10-23 07:08:15,048 INFO  [sparkDriver-akka.actor.default-dispatcher-2] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Adding task set 7.0 with 3 tasks
2014-10-23 07:08:15,049 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Starting task 0.0 in stage 7.0 (TID 8, localhost, ANY, 1202 bytes)
2014-10-23 07:08:15,050 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Starting task 1.0 in stage 7.0 (TID 9, localhost, ANY, 1209 bytes)
2014-10-23 07:08:15,050 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Starting task 2.0 in stage 7.0 (TID 10, localhost, ANY, 1209 bytes)
2014-10-23 07:08:15,050 INFO  [Executor task launch worker-3] executor.Executor (Logging.scala:logInfo(59)) - Running task 0.0 in stage 7.0 (TID 8)
2014-10-23 07:08:15,050 INFO  [Executor task launch worker-4] executor.Executor (Logging.scala:logInfo(59)) - Running task 1.0 in stage 7.0 (TID 9)
2014-10-23 07:08:15,051 INFO  [Executor task launch worker-5] executor.Executor (Logging.scala:logInfo(59)) - Running task 2.0 in stage 7.0 (TID 10)
2014-10-23 07:08:15,062 INFO  [Executor task launch worker-3] rdd.HadoopRDD (Logging.scala:logInfo(59)) - Input split: hdfs://edhcluster/user/edhuser/hive/warehouse/src/kv1.txt:0+5812
2014-10-23 07:08:15,062 INFO  [Executor task launch worker-4] rdd.HadoopRDD (Logging.scala:logInfo(59)) - Input split: hdfs://edhcluster/user/edhuser/hive/warehouse/src/kv1_copy_1.txt:0+5812
2014-10-23 07:08:15,062 INFO  [Executor task launch worker-5] rdd.HadoopRDD (Logging.scala:logInfo(59)) - Input split: hdfs://edhcluster/user/edhuser/hive/warehouse/src/kv1_copy_2.txt:0+5812
2014-10-23 07:08:15,081 ERROR [Executor task launch worker-5] executor.Executor (Logging.scala:logError(96)) - Exception in task 2.0 in stage 7.0 (TID 10)
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
	at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
	at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
	at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
	at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
	at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,081 ERROR [Executor task launch worker-3] executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 7.0 (TID 8)
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
	at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
	at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
	at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
	at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
	at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,085 ERROR [Executor task launch worker-5] executor.ExecutorUncaughtExceptionHandler (Logging.scala:logError(96)) - Uncaught exception in thread Thread[Executor task launch worker-5,5,main]
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
	at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
	at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
	at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
	at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
	at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,085 ERROR [Executor task launch worker-3] executor.ExecutorUncaughtExceptionHandler (Logging.scala:logError(96)) - Uncaught exception in thread Thread[Executor task launch worker-3,5,main]
java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
	at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
	at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
	at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
	at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
	at org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,089 WARN  [Result resolver thread-1] scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 0.0 in stage 7.0 (TID 8, localhost): java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
        org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
        org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
        org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
        org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
        org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
        org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
        org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,090 ERROR [Result resolver thread-1] scheduler.TaskSetManager (Logging.scala:logError(75)) - Task 0 in stage 7.0 failed 1 times; aborting job
2014-10-23 07:08:15,091 INFO  [Result resolver thread-3] scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Lost task 2.0 in stage 7.0 (TID 10) on executor localhost: java.lang.UnsatisfiedLinkError (org.xerial.snappy.SnappyNative.maxCompressedLength(I)I) [duplicate 1]
2014-10-23 07:08:15,092 ERROR [Executor task launch worker-4] storage.DiskBlockObjectWriter (Logging.scala:logError(96)) - Uncaught exception while reverting partial writes to file /tmp/spark-local-20141023070118-a435/0d/shuffle_0_1_0
java.io.FileNotFoundException: /tmp/spark-local-20141023070118-a435/0d/shuffle_0_1_0 (No such file or directory)
	at java.io.FileOutputStream.open(Native Method)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
	at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:178)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$revertWrites$1.apply(HashShuffleWriter.scala:118)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$revertWrites$1.apply(HashShuffleWriter.scala:117)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.revertWrites(HashShuffleWriter.scala:117)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.stop(HashShuffleWriter.scala:89)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,092 ERROR [Executor task launch worker-4] executor.Executor (Logging.scala:logError(96)) - Exception in task 1.0 in stage 7.0 (TID 9)
java.io.FileNotFoundException: /tmp/spark-local-20141023070118-a435/0d/shuffle_0_1_0 (No such file or directory)
	at java.io.FileOutputStream.open(Native Method)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
	at org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
	at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
	at org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,094 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Cancelling stage 7
2014-10-23 07:08:15,095 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet 7.0, whose tasks have all completed, from pool 
2014-10-23 07:08:15,095 INFO  [sparkDriver-akka.actor.default-dispatcher-5] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Stage 7 was cancelled
2014-10-23 07:08:15,096 WARN  [Result resolver thread-2] scheduler.TaskSetManager (Logging.scala:logWarning(71)) - Lost task 1.0 in stage 7.0 (TID 9, localhost): java.io.FileNotFoundException: /tmp/spark-local-20141023070118-a435/0d/shuffle_0_1_0 (No such file or directory)
        java.io.FileOutputStream.open(Native Method)
        java.io.FileOutputStream.<init>(FileOutputStream.java:221)
        org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:123)
        org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
2014-10-23 07:08:15,096 INFO  [Result resolver thread-2] scheduler.TaskSchedulerImpl (Logging.scala:logInfo(59)) - Removed TaskSet 7.0, whose tasks have all completed, from pool 
2014-10-23 07:08:15,096 INFO  [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run collect at SparkPlan.scala:85
error: 
     while compiling: <console>
        during phase: jvm
     library version: version 2.10.4
    compiler version: version 2.10.4
  reconstructed args: 

  last tree to typer: term value 
              symbol: variable value in object $eval (flags: <mutable> <triedcooking> private[this])
   symbol definition: private[this] var value: Throwable
                 tpe: <notype>
       symbol owners: variable value -> object $eval -> package $line49
      context owners: object $eval -> package $line49

== Enclosing template or block ==

Template( // val <local $eval>: <notype> in object $eval, tree.tpe=type
  "java.lang.Object" // parents
  ValDef(
    private
    "_"
    <tpt>
    <empty>
  )
  // 5 statements
  ValDef( // private[this] var value: Throwable in object $eval
    private <mutable> <local> <defaultinit> <triedcooking>
    "value "
    <tpt> // tree.tpe=Throwable
    <empty>
  )
  DefDef( // def value(): Throwable in object $eval
    <method> <accessor> <defaultinit> <triedcooking>
    "value"
    []
    List(Nil)
    <tpt> // tree.tpe=Throwable
    $eval.this."value " // private[this] var value: Throwable in object $eval, tree.tpe=Throwable
  )
  DefDef( // def value_=(x$1: Throwable): Unit in object $eval
    <method> <accessor> <defaultinit> <triedcooking>
    "value_$eq"
    []
    // 1 parameter list
    ValDef( // x$1: Throwable
      <param> <synthetic> <triedcooking>
      "x$1"
      <tpt> // tree.tpe=Throwable
      <empty>
    )
    <tpt> // tree.tpe=Unit
    Assign( // tree.tpe=Unit
      $eval.this."value " // private[this] var value: Throwable in object $eval, tree.tpe=Throwable
      "x$1" // x$1: Throwable, tree.tpe=Throwable
    )
  )
  DefDef( // def set(x: Object): Unit in object $eval
    <method>
    "set"
    []
    // 1 parameter list
    ValDef( // x: Object
      <param> <triedcooking>
      "x"
      <tpt> // tree.tpe=Object
      <empty>
    )
    <tpt> // tree.tpe=Unit
    Apply( // def value_=(x$1: Throwable): Unit in object $eval, tree.tpe=Unit
      $eval.this."value_$eq" // def value_=(x$1: Throwable): Unit in object $eval, tree.tpe=(x$1: Throwable)Unit
      Apply( // final def $asInstanceOf[T0 >: ? <: ?](): T0 in class Object, tree.tpe=Throwable
        TypeApply( // final def $asInstanceOf[T0 >: ? <: ?](): T0 in class Object, tree.tpe=()Throwable
          "x"."$asInstanceOf" // final def $asInstanceOf[T0 >: ? <: ?](): T0 in class Object, tree.tpe=[T0 >: ? <: ?]()T0
          <tpt> // tree.tpe=Throwable
        )
        Nil
      )
    )
  )
  DefDef( // def <init>(): type in object $eval
    <method>
    "<init>"
    []
    List(Nil)
    <tpt> // tree.tpe=type
    Block( // tree.tpe=Unit
      Apply( // def <init>(): Object in class Object, tree.tpe=Object
        $eval.super."<init>" // def <init>(): Object in class Object, tree.tpe=()Object
        Nil
      )
      ()
    )
  )
)

== Expanded type of tree ==

<notype>

uncaught exception during compilation: java.lang.AssertionError
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 (TID 8, localhost): java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
        org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
        org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
        org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1029)
        org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
        org.apache.spark.storage.BlockManager$$anonfun$8.apply(BlockManager.scala:608)
        org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:126)
        org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:192)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:67)
        org.apache.spark.shuffle.hash.HashShuffleWriter$$anonfun$write$1.apply(HashShuffleWriter.scala:65)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Regards
Arthur







On 22 Oct, 2014, at 9:45 pm, Arthur.hk.chan@gmail.com <arthur.hk.chan@gmail.com> wrote:

> Hi,
> 
> FYI, I use snappy-java-1.0.4.1.jar
> 
> Regards
> Arthur
> 
> 
> On 22 Oct, 2014, at 8:59 pm, Shao, Saisai <saisai.shao@intel.com> wrote:
> 
>> Thanks a lot, I will try to reproduce this in my local settings and dig into the details, thanks for your information.
>>  
>>  
>> BR
>> Jerry
>>  
>> From: Arthur.hk.chan@gmail.com [mailto:arthur.hk.chan@gmail.com] 
>> Sent: Wednesday, October 22, 2014 8:35 PM
>> To: Shao, Saisai
>> Cc: Arthur.hk.chan@gmail.com; user
>> Subject: Re: Spark Hive Snappy Error
>>  
>> Hi,
>>  
>> Yes, I can always reproduce the issue:
>>  
>> about you workload, Spark configuration, JDK version and OS version?
>>  
>> I ran SparkPI 1000
>>  
>> java -version
>> java version "1.7.0_67"
>> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>>  
>> cat /etc/centos-release
>> CentOS release 6.5 (Final)
>>  
>> My Spark’s hive-site.xml with following:
>>  <property>
>>   <name>hive.exec.compress.output</name>
>>   <value>true</value>
>>  </property>
>>  
>>  <property>
>>   <name>mapred.output.compression.codec</name>
>>   <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>>  </property>
>>  
>>  <property>
>>   <name>mapred.output.compression.type</name>
>>   <value>BLOCK</value>
>>  </property>
>>  
>> e.g.
>> MASTER=spark://m1:7077,m2:7077 ./bin/run-example SparkPi 1000
>> 2014-10-22 20:23:17,033 ERROR [sparkDriver-akka.actor.default-dispatcher-18] actor.ActorSystemImpl (Slf4jLogger.scala:apply$mcV$sp(66)) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver]
>> java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
>>          at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>>          at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>>          at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
>>          at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
>>          at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
>>          at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
>>          at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
>>          at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
>>          at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>>          at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>>          at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
>>          at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:829)
>>          at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:769)
>>          at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:753)
>>          at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1360)
>>          at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>          at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>          at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>          at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>          at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>          at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>          at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>          at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 2014-10-22 20:23:17,036 INFO  [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Failed to run reduce at SparkPi.scala:35
>> Exception in thread "main" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down
>>          at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:694)
>>          at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:693)
>>          at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>>          at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:693)
>>          at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.postStop(DAGScheduler.scala:1399)
>>          at akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
>>          at akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
>>          at akka.actor.ActorCell.terminate(ActorCell.scala:338)
>>          at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
>>          at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
>>          at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
>>          at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240)
>>          at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>          at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>          at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>          at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>          at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>          at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 2014-10-22 20:23:17,038 INFO  [sparkDriver-akka.actor.default-dispatcher-14] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon.
>> 2014-10-22 20:23:17,039 INFO  [sparkDriver-akka.actor.default-dispatcher-14] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports.
>>  
>>  
>> Regards
>> Arthur
>>  
>> On 17 Oct, 2014, at 9:33 am, Shao, Saisai <saisai.shao@intel.com> wrote:
>> 
>> 
>> Hi Arthur,
>>  
>> I think this is a known issue in Spark, you can check (https://issues.apache.org/jira/browse/SPARK-3958). I’m curious about it, can you always reproduce this issue, Is this issue related to some specific data sets, would you mind giving me some information about you workload, Spark configuration, JDK version and OS version?
>>  
>> Thanks
>> Jerry
>>  
>> From: Arthur.hk.chan@gmail.com [mailto:arthur.hk.chan@gmail.com] 
>> Sent: Friday, October 17, 2014 7:13 AM
>> To: user
>> Cc: Arthur.hk.chan@gmail.com
>> Subject: Spark Hive Snappy Error
>>  
>> Hi,
>>  
>> When trying Spark with Hive table, I got the “java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I” error,
>>  
>>  
>> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
>> sqlContext.sql(“select count(1) from q8_national_market_share
>> sqlContext.sql("select count(1) from q8_national_market_share").collect().foreach(println)
>> java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
>>          at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
>>          at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
>>          at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79)
>>          at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125)
>>          at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:207)
>>          at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:83)
>>          at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68)
>>          at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36)
>>          at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>>          at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>>          at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809)
>>          at org.apache.spark.sql.hive.HadoopTableReader.<init>(TableReader.scala:68)
>>          at org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:68)
>>          at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>>          at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
>>          at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
>>          at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>          at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>>          at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:146)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>>          at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>          at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>>          at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
>>          at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
>>          at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
>>          at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
>>          at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:438)
>>          at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
>>          at $iwC$$iwC$$iwC.<init>(<console>:20)
>>          at $iwC$$iwC.<init>(<console>:22)
>>          at $iwC.<init>(<console>:24)
>>          at <init>(<console>:26)
>>          at .<init>(<console>:30)
>>          at .<clinit>(<console>)
>>          at .<init>(<console>:7)
>>          at .<clinit>(<console>)
>>          at $print(<console>)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>          at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
>>          at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
>>          at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
>>          at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
>>          at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
>>          at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814)
>>          at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859)
>>          at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
>>          at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616)
>>          at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624)
>>          at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629)
>>          at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954)
>>          at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
>>          at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
>>          at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>          at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902)
>>          at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997)
>>          at org.apache.spark.repl.Main$.main(Main.scala:31)
>>          at org.apache.spark.repl.Main.main(Main.scala)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>          at java.lang.reflect.Method.invoke(Method.java:606)
>>          at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>>          at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>>          at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>  
>>  
>>  
>> I have already set the following in$PARK_HOME/conf/hive-site.xml
>>  <property>
>>   <name>hive.exec.compress.output</name>
>>   <value>true</value>
>>  </property>
>>  
>>  <property>
>>   <name>mapred.output.compression.codec</name>
>>   <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>>  </property>
>>  
>>  <property>
>>   <name>mapred.output.compression.type</name>
>>   <value>BLOCK</value>
>>  </property>
>>  
>>  
>> My questions:
>> Q1) Does it mean that I need to copy snappy files to Spark or Hive? 
>> Q2) or Do I need to recompile Spark (maven) with extra parameter like "-Drequire.snappy=true​-Pnative”? 
>> or how to fix this?
>>  
>>  
>> Regards
>> Arthur
> 


Mime
View raw message