Thanks Ryan

On Tue, Feb 5, 2019 at 10:28 PM Ryan Blue <rblue@netflix.com> wrote:
Shubham,

DataSourceV2 passes Spark's internal representation to your source and expects Spark's internal representation back from the source. That's why you consume and produce InternalRow: "internal" indicates that Spark doesn't need to convert the values.

Spark's internal representation for a date is the ordinal from the unix epoch date, 1970-01-01 = 0.

rb

On Tue, Feb 5, 2019 at 4:46 AM Shubham Chaurasia <shubh.chaurasia@gmail.com> wrote:
Hi All,

I am using custom DataSourceV2 implementation (Spark version 2.3.2)

Here is how I am trying to pass in date type from spark shell.

scala> val df = sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", col("datetype").cast("date"))
scala> df.write.format("com.shubham.MyDataSource").save

Below is the minimal write() method of my DataWriter implementation.
@Override
public void write(InternalRow record) throws IOException {
ByteArrayOutputStream format = streamingRecordFormatter.format(record);
System.out.println("MyDataWriter.write: " + record.get(0, DataTypes.DateType));
} 
It prints an integer as output: 
MyDataWriter.write: 17039

Is this a bug?  or I am doing something wrong?

Thanks,
Shubham


--
Ryan Blue
Software Engineer
Netflix