spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Mateos <luismat...@gmail.com>
Subject Re: How to set nullable field when create DataFrame using case class
Date Thu, 04 Aug 2016 23:28:17 GMT
Hi Jacek,

I have not used Encoders before. Definitely this works! Thank you!

Luis


On 4 August 2016 at 18:23, Jacek Laskowski <jacek@japila.pl> wrote:

> On Thu, Aug 4, 2016 at 11:56 PM, luismattor <luismattor@gmail.com> wrote:
>
> > import java.sql.Timestamp
> > case class MyProduct(t: Timestamp, a: Float)
> > val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
> > rdd.printSchema()
> >
> > The output is:
> > root
> >  |-- t: timestamp (nullable = true)
> >  |-- a: float (nullable = false)
> >
> > How can I set the timestamp column to be NOT nullable?
>
> Gotcha! :)
>
> scala> import java.sql.Timestamp
> import java.sql.Timestamp
>
> scala> case class MyProduct(t: java.sql.Timestamp, a: Float)
> defined class MyProduct
>
> scala> import org.apache.spark.sql._
> import org.apache.spark.sql._
>
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
>
> scala> import org.apache.spark.sql.catalyst.encoders._
> import org.apache.spark.sql.catalyst.encoders._
>
> scala> implicit def myEncoder: Encoder[MyProduct] =
> ExpressionEncoder[MyProduct].copy(schema = new StructType().add("t",
> "timestamp", false).add("a", "float", false))
> myEncoder: org.apache.spark.sql.Encoder[MyProduct]
>
> scala> spark.createDataset(Seq(MyProduct(new Timestamp(0),
> 10))).printSchema
> root
>  |-- t: timestamp (nullable = false)
>  |-- a: float (nullable = false)
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>

Mime
View raw message