Cool, thx for testing.
Best to file a JIRA.
—Andries
On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <sthota@maprtech.com> wrote:
> Vince/Andries,
>
> Perhaps this could be a bug. I get the same results.
>
> But the plan is very different, the UnionExchange is set up immediately after the scan
operation in successful case( Case -1 ), where as UnionExchange is happening after scan->project
(Case -2).
>
> Case -1.Successful case.
>
> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')
FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> 00-02 SelectionVectorRemover
> 00-03 Limit(fetch=[13015351])
> 00-04 UnionExchange
> 01-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
> | {
> "head" : {
> "version" : 1,
> "generator" : {
> "type" : "ExplainHandler",
> "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
> },
>
> Case -2. Unsuccessful case:
>
> 0: jdbc:drill:> explain plan for select to_timestamp(t.t, 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')
FROM (select * from dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
> +------------+------------+
> | text | json |
> +------------+------------+
> | 00-00 Screen
> 00-01 UnionExchange
> 01-01 Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'), 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> 01-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
ReadEntryWithPath [path=maprfs:/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test, numFiles=3, columns=[`*`]]])
> | {
> "head" : {
> "version" : 1,
> "generator" : {
> "type" : "ExplainHandler",
> "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
> },
>
> Thanks
> Sudhakar Thota
>
>
> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vince.gonzalez@gmail.com> wrote:
>
>> Ok, will do. Thanks.
>>
>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
>> aengelbrecht@maprtech.com> wrote:
>>
>>> Compare the query plans and you probably want to look at the log file to
>>> see what fails and post here.
>>>
>>>
>>>
>>> —Andries
>>>
>>>
>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vince.gonzalez@gmail.com>
>>> wrote:
>>>
>>>> Is this a bug?
>>>>
>>>> Created a parquet table (using CTAS) with one column containing text
>>>> timestamps.
>>>>
>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit 1;
>>>> +------------+
>>>> | t |
>>>> +------------+
>>>> | 2015-01-27T13:43:53.000Z |
>>>> +------------+
>>>> 1 row selected (0.119 seconds)
>>>>
>>>> The below queries, identical apart from the limit clause, behave
>>>> differently. The one with the limit clause works, the one without
>>> doesn't.
>>>> The limit is larger than the total number of rows, so in both cases we
>>>> should be processing all rows.
>>>>
>>>> No limit clause. It fails:
>>>>
>>>> ```
>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test) as
>>> t;
>>>> Query failed: RemoteRpcException: Failure while trying to start remote
>>>> fragment, Expression has syntax error! line 1:30:mismatched input 'T'
>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
>>>> 192.168.99.1:31010 ]
>>>> ```
>>>>
>>>> Limit clause in the subselect (larger than the number of rows in the
>>> table)
>>>> succeeds.
>>>>
>>>> ```
>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test limit
>>>> 100000000) as t;
>>>> ...
>>>> | 2015-02-17 07:18:00.0 |
>>>> +------------+
>>>> 13,015,350 rows selected (105.257 seconds)
>>>> ```
>>>>
>>>> Data can be downloaded here:
>>>>
>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
>>>
>>>
>
|