spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marius Soutier <mps....@gmail.com>
Subject Re: scala.collection.mutable.ArrayOps$ofRef$.length$extension since Spark 1.1.0
Date Sun, 26 Oct 2014 10:57:53 GMT
I tried that already, same exception. I also tried using an accumulator to collect all filenames.
The filename is not the problem.

Even this crashes with the same exception:

sc.parallelize(files.value).map { fileName =>
      println(s"Scanning $fileName")
      try {
        println(s"Scanning $fileName")
        sc.textFile(fileName).take(1)
        s"Successfully scanned $fileName"
      } catch {
        case t: Throwable => s"Failed to process $fileName, reason ${t.getStackTrace.head}"
      }
    }
    .saveAsTextFile(output)

The output file contains “Failed to process…" for each file.


On 26.10.2014, at 00:10, Buttler, David <buttler1@llnl.gov> wrote:

> This sounds like expected behavior to me.  The foreach call should be distributed on
the workers.  perhaps you want to use map instead, and then collect the failed file names
locally, or save the whole thing out to a file
> ________________________________________
> From: Marius Soutier [mps.dev@gmail.com]
> Sent: Friday, October 24, 2014 6:35 AM
> To: user@spark.apache.org
> Subject: scala.collection.mutable.ArrayOps$ofRef$.length$extension since Spark 1.1.0
> 
> Hi,
> 
> I’m running a job whose simple task it is to find files that cannot be read (sometimes
our gz files are corrupted).
> 
> With 1.0.x, this worked perfectly. Since 1.1.0 however, I’m getting an exception: scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
> 
>    sc.wholeTextFiles(input)
>      .foreach { case (fileName, _) =>
>        try {
>          println(s"Scanning $fileName")
>          sc.textFile(fileName).take(1)
>          println(s"Successfully scanned $fileName")
>        } catch {
>          case t: Throwable => println(s"Failed to process $fileName, reason ${t.getStackTrace.head}")
>        }
>      }
> 
> 
> Also since 1.1.0, the printlns are no longer visible on the console, only in the Spark
UI worker output.
> 
> 
> Thanks for any help
> - Marius
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message