spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Scheuring (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
Date Tue, 11 Oct 2016 20:00:22 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566349#comment-15566349
] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:59 PM:
--------------------------------------------------------------------

_Note that I am entirely new to the process of submitting issues on this system: if this needs
to be a new issue, I would appreciate someone letting me know._

A bug very similar to this one is 100% reproducible across multiple machines, running both
Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

_Update:_  The bug also does not occur when run on the installation of Spark 2.0.1 on the
Windows 10 machine running inside "Bash on Ubuntu on Windows", i.e. the Linux subsystem running
on the Windows 10 machine where the bug _does_ occur when the program is executed from Windows.

This program will produce the bug (if {{poemData}} is defined per the commented-out section,
rather than being read from a CSV file, the bug does not occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

    val poemSchema = StructType(
      Seq(
        StructField("label",IntegerType), 
        StructField("line",StringType)
      )
    )

    val sparkSession = SparkSession.builder()
      .appName("Spark Bug Demonstration")
      .master("local[*]")
      .getOrCreate()

//    val poemData = sparkSession.createDataFrame(Seq(
//      (0, "There's many a strong farmer"),
//      (0, "Who's heart would break in two"),
//      (1, "If he could see the townland"),
//      (1, "That we are riding to;")
//    )).toDF("label", "line")

    val poemData = sparkSession.read
      .option("quote", value="")
      .schema(poemSchema)
      .csv(args(0))

    println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string
pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this system: if this needs
to be a new issue, I would appreciate someone letting me know._

A bug very similar to this one is 100% reproducible across multiple machines, running both
Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the commented-out section,
rather than being read from a CSV file, the bug does not occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

    val poemSchema = StructType(
      Seq(
        StructField("label",IntegerType), 
        StructField("line",StringType)
      )
    )

    val sparkSession = SparkSession.builder()
      .appName("Spark Bug Demonstration")
      .master("local[*]")
      .getOrCreate()

//    val poemData = sparkSession.createDataFrame(Seq(
//      (0, "There's many a strong farmer"),
//      (0, "Who's heart would break in two"),
//      (1, "If he could see the townland"),
//      (1, "That we are riding to;")
//    )).toDF("label", "line")

    val poemData = sparkSession.read
      .option("quote", value="")
      .schema(poemSchema)
      .csv(args(0))

    println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string
pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --------------------------------------
>
>                 Key: SPARK-12216
>                 URL: https://issues.apache.org/jira/browse/SPARK-12216
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>         Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>            Reporter: stefan
>            Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark temp dir:
C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
>         at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
>         at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
>         at scala.util.Try$.apply(Try.scala:161)
>         at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
>         at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
>         at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message