spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Afshartous, Nick" <nafshart...@turbine.com>
Subject Using Sqark SQL mapping over an RDD
Date Thu, 08 Oct 2015 17:10:29 GMT

Hi,

Am using Spark, 1.5 in latest EMR 4.1.

I have an RDD of String

   scala> deviceIds
      res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at map at <console>:28

and then when trying to map over the RDD while attempting to run a sql query the result is
a NullPointerException

  scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()

with the stack trace below.  If I run the query as a top level expression the count is retuned.
 There was additional code within
the anonymous function that's been removed to try and isolate.

Thanks for any insights or advice on how to debug this.
--
      Nick


scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
deviceIds.map(id => sqlContext.sql("select * from ad_info")).count()
15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40
15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) with 200 output
partitions
15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at <console>:40)
15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 36)
15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List()
15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37 (MapPartitionsRDD[37] at map
at <console>:40), which has no missing parents
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with curMem=531894, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated
size 17.5 KB, free 534.5 MB)
15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with curMem=549798, maxMem=560993402
15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated
size 7.0 KB, free 534.5 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 10.247.0.117:33555
(size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast at DAGScheduler.scala:861
15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 37 (MapPartitionsRDD[37]
at map at <console>:40)
15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 650, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:46227
(size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:32938
(size: 7.0 KB, free: 535.0 MB)
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 (TID 651, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal):
java.lang.NullPointerException
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 (TID 652, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID 650) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 1]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 (TID 653, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID 651) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 2]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 (TID 654, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID 652) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 3]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 (TID 655, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID 653) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 4]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 (TID 656, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID 654) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 5]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 (TID 657, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID 655) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 6]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID 657) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 7]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID 656) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 8]
15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal,
PROCESS_LOCAL, 1914 bytes)
15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID 658) on executor ip-10-247-0-117.ec2.internal:
java.lang.NullPointerException (null) [duplicate 9]
15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 times; aborting job
15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37
15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled
15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at <console>:40) failed in
0.128 s
15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, took 0.145419
s
15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal):
TaskKilled (killed intentionally)
15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal):
TaskKilled (killed intentionally)
15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed,
from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed
4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal):
java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
        at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
        at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
        at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
        at $iwC$$iwC$$iwC.<init>(<console>:77)
        at $iwC$$iwC.<init>(<console>:79)
        at $iwC.<init>(<console>:81)
        at <init>(<console>:83)
        at .<init>(<console>:87)
        at .<clinit>(<console>)
        at .<init>(<console>:7)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
        at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
        at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
        at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
        at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
        at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
        at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
        at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
        at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
        at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
        at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
        at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
        at org.apache.spark.repl.Main$.main(Main.scala:31)
        at org.apache.spark.repl.Main.main(Main.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 10.247.0.117:33555
in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:46227
in memory (size: 7.0 KB, free: 535.0 MB)
15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:32938
in memory (size: 7.0 KB, free: 535.0 MB)



scala>

Notice: This communication is for the intended recipient(s) only and may contain confidential,
proprietary, legally protected or privileged information of Turbine, Inc. If you are not the
intended recipient(s), please notify the sender at once and delete this communication. Unauthorized
use of the information in this communication is strictly prohibited and may be unlawful. For
those recipients under contract with Turbine, Inc., the information in this communication
is subject to the terms and conditions of any applicable contracts or agreements.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message