spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Chaurasia <shubh.chaura...@gmail.com>
Subject Re: DataSourceV2 producing wrong date value in Custom Data Writer
Date Wed, 06 Feb 2019 08:21:11 GMT
Thanks Ryan

On Tue, Feb 5, 2019 at 10:28 PM Ryan Blue <rblue@netflix.com> wrote:

> Shubham,
>
> DataSourceV2 passes Spark's internal representation to your source and
> expects Spark's internal representation back from the source. That's why
> you consume and produce InternalRow: "internal" indicates that Spark
> doesn't need to convert the values.
>
> Spark's internal representation for a date is the ordinal from the unix
> epoch date, 1970-01-01 = 0.
>
> rb
>
> On Tue, Feb 5, 2019 at 4:46 AM Shubham Chaurasia <
> shubh.chaurasia@gmail.com> wrote:
>
>> Hi All,
>>
>> I am using custom DataSourceV2 implementation (*Spark version 2.3.2*)
>>
>> Here is how I am trying to pass in *date type *from spark shell.
>>
>> scala> val df =
>>> sc.parallelize(Seq("2019-02-05")).toDF("datetype").withColumn("datetype",
>>> col("datetype").cast("date"))
>>> scala> df.write.format("com.shubham.MyDataSource").save
>>
>>
>> Below is the minimal write() method of my DataWriter implementation.
>>
>> @Override
>> public void write(InternalRow record) throws IOException {
>>   ByteArrayOutputStream format = streamingRecordFormatter.format(record);
>>   System.out.println("MyDataWriter.write: " + record.get(0, DataTypes.DateType));
>>
>> }
>>
>> It prints an integer as output:
>>
>> MyDataWriter.write: 17039
>>
>>
>> Is this a bug?  or I am doing something wrong?
>>
>> Thanks,
>> Shubham
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message