spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kaveen Raajan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5594) SparkException: Failed to get broadcast (TorrentBroadcast)
Date Thu, 13 Aug 2015 06:31:46 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694780#comment-14694780
] 

Kaveen Raajan edited comment on SPARK-5594 at 8/13/15 6:31 AM:
---------------------------------------------------------------

Hi
I'm also facing such error, when two thread of job are running simultaneously each in its
own SparkContext.

we testing with spark-ThristServer, by submitting two similar request from different thread
simultaneously 
we clean the SparkContext every 2mins.
spark.cleaner.ttl 2000

Below query are working for first few try, but after some time it throws broadcast error.
We can't find the exact replication procedure for this error.
{panel:title=Query tried}
SELECT column1  SUM(column2) FROM TableName  JOIN ( SELECT column1,  COUNT(1),   SUM(column2)
 FROM TableName GROUP BY column1  ORDER BY column3 DESC LIMIT 5) t0 ON (column1 = t0.column1)
GROUP BY column1
{panel}

Error Detail:
{code}
15/08/12 12:13:58 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 300.0 failed
4 times, most recent failure: Lost task 1.3 in stage 300.0 (TID15134, sparkHeadNode.root.lan):
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of
broadcast_2
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1257)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
{code}


was (Author: kaveenbigdata):
Hi
I'm also facing such error, when two thread of job are running simultaneously each in its
own SparkContext.

I testing with spark-ThristServer, by submitting two similar request from different thread
simultaneously 
I clean the SparkContext every 2mins.
spark.cleaner.ttl 2000
{panel:title=Query tried}
SELECT column1  SUM(column2) FROM TableName  JOIN ( SELECT column1,  COUNT(1),   SUM(column2)
 FROM TableName GROUP BY column1  ORDER BY column3 DESC LIMIT 5) t0 ON (column1 = t0.column1)
GROUP BY column1
{panel}

Error Detail:
{code}
15/08/12 12:13:58 ERROR thriftserver.SparkExecuteStatementOperation: Error executing query:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 300.0 failed
4 times, most recent failure: Lost task 1.3 in stage 300.0 (TID15134, sparkHeadNode.root.lan):
java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_2_piece0 of
broadcast_2
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1257)
        at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
        at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
{code}

> SparkException: Failed to get broadcast (TorrentBroadcast)
> ----------------------------------------------------------
>
>                 Key: SPARK-5594
>                 URL: https://issues.apache.org/jira/browse/SPARK-5594
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: John Sandiford
>            Priority: Critical
>
> I am uncertain whether this is a bug, however I am getting the error below when running
on a cluster (works locally), and have no idea what is causing it, or where to look for more
information.
> Any help is appreciated.  Others appear to experience the same issue, but I have not
found any solutions online.
> Please note that this only happens with certain code and is repeatable, all my other
spark jobs work fine.
> {noformat}
> ERROR TaskSetManager: Task 3 in stage 6.0 failed 4 times; aborting job
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage
failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost task 3.3 in stage 6.0
(TID 24, <removed>): java.io.IOException: org.apache.spark.SparkException: Failed to
get broadcast_6_piece0 of broadcast_6
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
>         at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
>         at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
>         at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
>         at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
>         at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_6_piece0 of broadcast_6
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
>         at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
>         at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1008)
>         ... 11 more
> {noformat}
> Driver stacktrace:
> {noformat}
>         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
>         at scala.Option.foreach(Option.scala:236)
>         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message