DataSourceV2 passes Spark's internal representation to your source and expects Spark's internal representation back from the source. That's why you consume and produce InternalRow: "internal" indicates that Spark doesn't need to convert the values.

Spark's internal representation for a date is the ordinal from the unix epoch date, 1970-01-01 = 0.


Hi All,

I am using custom DataSourceV2 implementation (Spark version 2.3.2)

Here is how I am trying to pass in date type from spark shell.

scala> val df = sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype", col("datetype").cast("date"))
scala> df.write.format("com.shubham.MyDataSource").save

Below is the minimal write() method of my DataWriter implementation.
public void write(InternalRow record) throws IOException {
ByteArrayOutputStream format = streamingRecordFormatter.format(record);
System.out.println("MyDataWriter.write: " + record.get(0, DataTypes.DateType));
It prints an integer as output: 
MyDataWriter.write: 17039

Is this a bug?  or I am doing something wrong?


