drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudhakar Thota <sth...@maprtech.com>
Subject Re: more on parsing timestamps
Date Thu, 02 Apr 2015 20:52:02 GMT
Here it is:

https://maprdrill.atlassian.net/browse/MD-204?filter=-2

Thanks
Sudhakar Thota


On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <aengelbrecht@maprtech.com> wrote:

> Cool, thx for testing. 
> 
> Best to file a JIRA.
> 
> —Andries
> 
> On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <sthota@maprtech.com> wrote:
> 
>> Vince/Andries,
>> 
>> Perhaps this could be a bug. I get the same results. 
>> 
>> But the plan is very different, the UnionExchange is set up immediately after the
scan operation in successful case( Case -1 ), where as UnionExchange is happening after scan->project
(Case -2).
>> 
>> Case -1.Successful case.
>> 
>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')
FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
>> +------------+------------+
>> |    text    |    json    |
>> +------------+------------+
>> | 00-00    Screen
>> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>> 00-02        SelectionVectorRemover
>> 00-03          Limit(fetch=[13015351])
>> 00-04            UnionExchange
>> 01-01              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
>> | {
>> "head" : {
>>   "version" : 1,
>>   "generator" : {
>>     "type" : "ExplainHandler",
>>     "info" : ""
>>   },
>>   "type" : "APACHE_DRILL_PHYSICAL",
>>   "options" : [ ],
>>   "queue" : 0,
>>   "resultMode" : "EXEC"
>> },
>> 
>> Case -2. Unsuccessful case:
>> 
>> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')
FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
>> +------------+------------+
>> |    text    |    json    |
>> +------------+------------+
>> | 00-00    Screen
>> 00-01      UnionExchange
>> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
>> 01-02          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
>> | {
>> "head" : {
>>   "version" : 1,
>>   "generator" : {
>>     "type" : "ExplainHandler",
>>     "info" : ""
>>   },
>>   "type" : "APACHE_DRILL_PHYSICAL",
>>   "options" : [ ],
>>   "queue" : 0,
>>   "resultMode" : "EXEC"
>> },
>> 
>> Thanks
>> Sudhakar Thota
>> 
>> 
>> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vince.gonzalez@gmail.com> wrote:
>> 
>>> Ok, will do. Thanks.
>>> 
>>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>>> aengelbrecht@maprtech.com> wrote:
>>> 
>>>> Compare the query plans and you probably want to look at the log file to
>>>> see what fails and post here.
>>>> 
>>>> 
>>>> 
>>>> —Andries
>>>> 
>>>> 
>>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vince.gonzalez@gmail.com>
>>>> wrote:
>>>> 
>>>>> Is this a bug?
>>>>> 
>>>>> Created a parquet table (using CTAS) with one column containing text
>>>>> timestamps.
>>>>> 
>>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>>> +------------+
>>>>> |     t      |
>>>>> +------------+
>>>>> | 2015-01-27T13:43:53.000Z |
>>>>> +------------+
>>>>> 1 row selected (0.119 seconds)
>>>>> 
>>>>> The below queries, identical apart from the limit clause, behave
>>>>> differently. The one with the limit clause works, the one without
>>>> doesn't.
>>>>> The limit is larger than the total number of rows, so in both cases we
>>>>> should be processing all rows.
>>>>> 
>>>>> No limit clause. It fails:
>>>>> 
>>>>> ```
>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test)
as
>>>> t;
>>>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>>> 192.168.99.1:31010 ]
>>>>> ```
>>>>> 
>>>>> Limit clause in the subselect (larger than the number of rows in the
>>>> table)
>>>>> succeeds.
>>>>> 
>>>>> ```
>>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>>>> 100000000) as t;
>>>>> ...
>>>>> | 2015-02-17 07:18:00.0 |
>>>>> +------------+
>>>>> 13,015,350 rows selected (105.257 seconds)
>>>>> ```
>>>>> 
>>>>> Data can be downloaded here:
>>>>> 
>>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>>> 
>>>> 
>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message