spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: DataSourceV2 producing wrong date value in Custom Data Writer
Date Tue, 05 Feb 2019 16:58:21 GMT
Shubham,

DataSourceV2 passes Spark's internal representation to your source and
expects Spark's internal representation back from the source. That's why
you consume and produce InternalRow: "internal" indicates that Spark
doesn't need to convert the values.

Spark's internal representation for a date is the ordinal from the unix
epoch date, 1970-01-01 = 0.

rb

On Tue, Feb 5, 2019 at 4:46 AM Shubham Chaurasia <shubh.chaurasia@gmail.com>
wrote:

> Hi All,
>
> I am using custom DataSourceV2 implementation (*Spark version 2.3.2*)
>
> Here is how I am trying to pass in *date type *from spark shell.
>
> scala> val df =
>> sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype",
>> col("datetype").cast("date"))
>> scala> df.write.format("com.shubham.MyDataSource").save
>
>
> Below is the minimal write() method of my DataWriter implementation.
>
> @Override
> public void write(InternalRow record) throws IOException {
>   ByteArrayOutputStream format = streamingRecordFormatter.format(record);
>   System.out.println("MyDataWriter.write: " + record.get(0, DataTypes.DateType));
>
> }
>
> It prints an integer as output:
>
> MyDataWriter.write: 17039
>
>
> Is this a bug?  or I am doing something wrong?
>
> Thanks,
> Shubham
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message