spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Baretta (JIRA)" <>
Subject [jira] [Created] (SPARK-5060) Spark driver main thread hanging after SQL insert in Parquet file
Date Fri, 02 Jan 2015 21:37:34 GMT
Alex Baretta created SPARK-5060:

             Summary: Spark driver main thread hanging after SQL insert in Parquet file
                 Key: SPARK-5060
             Project: Spark
          Issue Type: Bug
            Reporter: Alex Baretta

Here's what the console shows:

15/01/01 01:12:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 58.0, whose tasks have
all completed, from pool
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Stage 58 (runJob at ParquetTableOperations.scala:326)
finished in 5493.549 s
15/01/01 01:12:29 INFO scheduler.DAGScheduler: Job 41 finished: runJob at ParquetTableOperations.scala:326,
took 5493.747061 s

It is now 01:40:03, so the driver has been hanging for the last 28 minutes. The web UI on
the other hand shows that all tasks completed successfully, and the output directory has been
populated--although the _SUCCESS file is missing.

It is worth noting that my code started this job as its own thread. The actual code looks
like the following snippet, modulo some simplifications.

  def save_to_parquet(allowExisting : Boolean = false) = {
    val threads = => {
      val thread = new Thread {
        override def run {

As far as I can see the insertInto call never returns.

The version of Spark I'm using is built from master, off of this commit:

commit 815de54002f9c1cfedc398e95896fa207b4a5305
Author: YanTangZhai <>
Date:   Mon Dec 29 11:30:54 2014 -0800

    [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce
the chance of the communicating problem

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message