spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian Dragoș <iulian.dra...@typesafe.com>
Subject Re: NPE in Parquet
Date Tue, 20 Jan 2015 16:43:49 GMT
It’s an array.length, where the array is null. Looking through the code, it
looks like the type converter assumes that FileSystem.globStatus never
returns null, but that is not the case. Digging through the Hadoop
codebase, inside Globber.glob, here’s what I found:

    /*
     * When the input pattern "looks" like just a simple filename, and we
     * can't find it, we return null rather than an empty array.
     * This is a special case which the shell relies on.
     *
     * To be more precise: if there were no results, AND there were no
     * groupings (aka brackets), and no wildcards in the input (aka stars),
     * we return null.
     */

     if ((!sawWildcard) && results.isEmpty() &&
        (flattenedPatterns.size() <= 1)) {
      return null;
    }

So, if your file is a concrete filename, without wildcards, you might get a
null back. Seems like a bug in ParquetTypesConverter.

iulian
​

On Tue, Jan 20, 2015 at 5:29 PM, Alessandro Baretta <alexbaretta@gmail.com>
wrote:

> All,
>
> I strongly suspect this might be caused by a glitch in the communication
> with Google Cloud Storage where my job is writing to, as this NPE exception
> shows up fairly randomly. Any ideas?
>
> Exception in thread "Thread-126" java.lang.NullPointerException
>         at
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
>         at
> scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
>         at
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
>         at
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>         at
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>         at
> scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:108)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$.readMetaData(ParquetTypes.scala:447)
>         at
> org.apache.spark.sql.parquet.ParquetTypesConverter$.readSchemaFromFile(ParquetTypes.scala:485)
>         at
> org.apache.spark.sql.parquet.ParquetRelation.<init>(ParquetRelation.scala:65)
>         at
> org.apache.spark.sql.SQLContext.parquetFile(SQLContext.scala:190)
>         at
> Truven$Stats$anonfun$save_to_parquet$3$anonfun$21$anon$7.run(Truven.scala:957)
>
>
> Alex
>



-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Mime
View raw message