drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lee, David" <David....@blackrock.com>
Subject RE: Parquet Date Format Problem
Date Tue, 01 Nov 2016 19:00:09 GMT
Found a work around by subtracting 4881176 days when generating parquet files and verified
the correct dates in Spark..

+----------+
|     as_of|
+----------+
|2016-09-30|
|2015-08-05|
+----------+

0: jdbc:drill:zk=local> create table dfs.tmp.`/test` as select DATE_ADD(cast(as_of AS date),
-4881176) as as_of from table(dfs.`/tmp /test.txt`(type => 'text', fieldDelimiter =>
',', extractHeader => true));

java -jar parquet-tools-1.6.1-SNAPSHOT.jar head -n3 /tmp/test
as_of = 17074

as_of = 16652


David Lee
Vice President | BlackRock 
Phone: +1.415.670.2744 | Mobile: +1.415.706.6874


-----Original Message-----
From: rahul challapalli [mailto:challapallirahul@gmail.com] 
Sent: Tuesday, November 01, 2016 11:28 AM
To: user <user@drill.apache.org>
Subject: Re: Parquet Date Format Problem

The fix will be available with the Drill 1.9 release unless you want to build from source
yourself.

On Tue, Nov 1, 2016 at 11:24 AM, Lee, David <David.Lee@blackrock.com> wrote:

> Nevermind.. Found the problem..
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_DRILL-2D4203&d=DQIFaQ&c=zUO0BtkCe66yJvAZ4cAvZg&r=SpeiLeBT
> ifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=OMoe-8auI3Ux9axzRFzxp7ArI-nYM2kX
> DCZ-XJMqFeE&s=9cht_VOrVnTsWrUJg3KKAeqekOC0UkHDGd3wVSJqifA&e=
>
>
> David Lee
> Vice President | BlackRock
> Phone: +1.415.670.2744 | Mobile: +1.415.706.6874
>
> From: Lee, David
> Sent: Tuesday, November 01, 2016 11:21 AM
> To: 'user@drill.apache.org' <user@drill.apache.org>
> Subject: Parquet Date Format Problem
>
> I created a parquet file using Drill, but date values in the parquet 
> files don’t appear to be a logical INT32 type and as such when I’m 
> trying to read the parquet file in Spark it looks corrupted..
>
> Here’s my test case..
>
>
> A.     Create a test.txt file in /tmp:
>
> as_of
> 2016-09-30
>
>
> B.     Convert it to parquet using Drill:
>
> 0: jdbc:drill:zk=local> create table dfs.tmp.`/test` as select 
> cast(as_of AS date) as as_of from table(dfs.`/tmp/test.txt`(type => 
> 'text', fieldDelimiter => ',', extractHeader => true));
>
>
> C.    Read the new file using Drill which looks fine:
>
>
> 0: jdbc:drill:zk=local> select * from dfs.`/tmp/test`;
> +-------------+
> |    as_of    |
> +-------------+
> | 2016-09-30  |
> +-------------+
>
>
> D.    However running parquet-tools on it gives a completely different
> result:
>
> java -jar parquet-tools-1.6.1-SNAPSHOT.jar head -n3 /tmp/test as_of = 
> 4898250
>
> java -jar parquet-tools-1.6.1-SNAPSHOT.jar schema 
> /tmp/test/0_0_0.parquet message root {
>   required int32 as_of (DATE);
> }
>
> According to the Parquet docs.. 4898250 days after Jan 1st 1970 is 
> sometime in the year 15,435..
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Parque
> t_parquet-2Dformat_blob_master_LogicalTypes.md&d=DQIFaQ&c=zUO0BtkCe66y
> JvAZ4cAvZg&r=SpeiLeBTifecUrj1SErsTRw4nAqzMxT043sp_gndNeI&m=OMoe-8auI3U
> x9axzRFzxp7ArI-nYM2kXDCZ-XJMqFeE&s=dMrQzMV0gwJbSL_Vl48Zk41FW3V6RRVuqes
> WaXAFKtk&e=
> DATE
> DATE is used to for a logical date type, without a time of day. It 
> must annotate an int32 that stores the number of days from the Unix 
> epoch, 1 January 1970.
>
>
>
> David Lee
> Vice President | BlackRock
> Phone: +1.415.670.2744 | Mobile: +1.415.706.6874
>
>
> This message may contain information that is confidential or privileged.
> If you are not the intended recipient, please advise the sender 
> immediately and delete this message. See http://www.blackrock.com/ 
> corporate/en-us/compliance/email-disclaimers for further information.
> Please refer to http://www.blackrock.com/corporate/en-us/compliance/
> privacy-policy for more information about BlackRock’s Privacy Policy.
> For a list of BlackRock's office addresses worldwide, see 
> http://www.blackrock.com/corporate/en-us/about-us/contacts-locations.
>
> © 2016 BlackRock, Inc. All rights reserved.
>
Mime
View raw message