spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From A Shaikh <shaikh.af...@gmail.com>
Subject Re: Handling null in dataset
Date Wed, 11 Jan 2017 18:51:34 GMT
I tried DataFrame option below, not sure what that is for but doesnt seems
to work.


   - nullValue: specifies a string that indicates a null value, nulls in
   the DataFrame will be written as this string.


On 11 January 2017 at 17:11, A Shaikh <shaikh.afzal@gmail.com> wrote:

>
>
> How does Spark handle null values.
>
> case class AvroSource(name: String, age: Integer, sal: Long, col_float:
> Float, col_double: Double, col_bytes: String, col_bool: Boolean )
>
>
>     val userDS = spark.read.format("com.databricks.spark.avro").option("nullValue",
> "x").load("./users.avro")//.as[AvroSource]
>     userDS.printSchema()
>     userDS.show()
>     userDS.createOrReplaceTempView("user")
>     spark.sql("select * from user where xdouble is not null ").show()
>
>
>
> [image: Inline images 2]
>
>
> Adding Following lines to the code returns error which seems contradicting
> to the schema which says nullable = true. how to handle null here?
>
>     val filteredDS = userDS.filter(_.age > 30)
>     filteredDS.show(10)
>
> java.lang.RuntimeException: Null value appeared in non-nullable field:
> - field (class: "scala.Double", name: "col_double")
> - root class: "com.model.AvroSource"
> If the schema is inferred from a Scala tuple/case class, or a Java bean,
> please try to use scala.Option[_] or other nullable types (e.g.
> java.lang.Integer instead of int/scala.Int).
>
>

Mime
View raw message