drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: more on parsing timestamps
Date Thu, 02 Apr 2015 23:09:28 GMT
Could you please file a public jira. That link is to an internal issue.

On Thu, Apr 2, 2015 at 1:52 PM, Sudhakar Thota <sthota@maprtech.com> wrote:

> Here it is:
>
> https://maprdrill.atlassian.net/browse/MD-204?filter=-2
>
> Thanks
> Sudhakar Thota
>
>
> On Apr 2, 2015, at 1:36 PM, Andries Engelbrecht <aengelbrecht@maprtech.com>
> wrote:
>
> > Cool, thx for testing.
> >
> > Best to file a JIRA.
> >
> > —Andries
> >
> > On Apr 2, 2015, at 1:27 PM, Sudhakar Thota <sthota@maprtech.com> wrote:
> >
> >> Vince/Andries,
> >>
> >> Perhaps this could be a bug. I get the same results.
> >>
> >> But the plan is very different, the UnionExchange is set up immediately
> after the scan operation in successful case( Case -1 ), where as
> UnionExchange is happening after scan->project (Case -2).
> >>
> >> Case -1.Successful case.
> >>
> >> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
> dfs.sthota_prq.`/tstamp_test/*.parquet` limit 13015351) t;
> >> +------------+------------+
> >> |    text    |    json    |
> >> +------------+------------+
> >> | 00-00    Screen
> >> 00-01      Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> >> 00-02        SelectionVectorRemover
> >> 00-03          Limit(fetch=[13015351])
> >> 00-04            UnionExchange
> >> 01-01              Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
> numFiles=3, columns=[`*`]]])
> >> | {
> >> "head" : {
> >>   "version" : 1,
> >>   "generator" : {
> >>     "type" : "ExplainHandler",
> >>     "info" : ""
> >>   },
> >>   "type" : "APACHE_DRILL_PHYSICAL",
> >>   "options" : [ ],
> >>   "queue" : 0,
> >>   "resultMode" : "EXEC"
> >> },
> >>
> >> Case -2. Unsuccessful case:
> >>
> >> 0: jdbc:drill:> explain plan for select to_timestamp(t.t,
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select * from
> dfs.sthota_prq.`/tstamp_test/*.parquet` ) t;
> >> +------------+------------+
> >> |    text    |    json    |
> >> +------------+------------+
> >> | 00-00    Screen
> >> 00-01      UnionExchange
> >> 01-01        Project(EXPR$0=[TO_TIMESTAMP(ITEM($0, 't'),
> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''')])
> >> 01-02          Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_2_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_1_0.parquet],
> ReadEntryWithPath [path=maprfs:/mapr/
> demo.mapr.com/user/sthota/parquet/tstamp_test/1_0_0.parquet]],
> selectionRoot=/mapr/demo.mapr.com/user/sthota/parquet/tstamp_test,
> numFiles=3, columns=[`*`]]])
> >> | {
> >> "head" : {
> >>   "version" : 1,
> >>   "generator" : {
> >>     "type" : "ExplainHandler",
> >>     "info" : ""
> >>   },
> >>   "type" : "APACHE_DRILL_PHYSICAL",
> >>   "options" : [ ],
> >>   "queue" : 0,
> >>   "resultMode" : "EXEC"
> >> },
> >>
> >> Thanks
> >> Sudhakar Thota
> >>
> >>
> >> On Apr 2, 2015, at 12:01 PM, Vince Gonzalez <vince.gonzalez@gmail.com>
> wrote:
> >>
> >>> Ok, will do. Thanks.
> >>>
> >>> On Thu, Apr 2, 2015 at 2:49 PM, Andries Engelbrecht <
> >>> aengelbrecht@maprtech.com> wrote:
> >>>
> >>>> Compare the query plans and you probably want to look at the log file
> to
> >>>> see what fails and post here.
> >>>>
> >>>>
> >>>>
> >>>> —Andries
> >>>>
> >>>>
> >>>> On Apr 1, 2015, at 12:54 PM, Vince Gonzalez <vince.gonzalez@gmail.com
> >
> >>>> wrote:
> >>>>
> >>>>> Is this a bug?
> >>>>>
> >>>>> Created a parquet table (using CTAS) with one column containing
text
> >>>>> timestamps.
> >>>>>
> >>>>> 0: jdbc:drill:zk=localhost:2181> select * from tstamp_test limit
1;
> >>>>> +------------+
> >>>>> |     t      |
> >>>>> +------------+
> >>>>> | 2015-01-27T13:43:53.000Z |
> >>>>> +------------+
> >>>>> 1 row selected (0.119 seconds)
> >>>>>
> >>>>> The below queries, identical apart from the limit clause, behave
> >>>>> differently. The one with the limit clause works, the one without
> >>>> doesn't.
> >>>>> The limit is larger than the total number of rows, so in both cases
> we
> >>>>> should be processing all rows.
> >>>>>
> >>>>> No limit clause. It fails:
> >>>>>
> >>>>> ```
> >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> >>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test)
> as
> >>>> t;
> >>>>> Query failed: RemoteRpcException: Failure while trying to start
> remote
> >>>>> fragment, Expression has syntax error! line 1:30:mismatched input
'T'
> >>>>> expecting CParen [ 7d30d753-0822-4820-afd0-b7e7fe5e639c on
> >>>>> 192.168.99.1:31010 ]
> >>>>> ```
> >>>>>
> >>>>> Limit clause in the subselect (larger than the number of rows in
the
> >>>> table)
> >>>>> succeeds.
> >>>>>
> >>>>> ```
> >>>>> 0: jdbc:drill:zk=localhost:2181> select to_timestamp(t.t,
> >>>>> 'YYYY-MM-dd''T''HH:mm:ss.SSS''Z''') FROM (select t from tstamp_test
> limit
> >>>>> 100000000) as t;
> >>>>> ...
> >>>>> | 2015-02-17 07:18:00.0 |
> >>>>> +------------+
> >>>>> 13,015,350 rows selected (105.257 seconds)
> >>>>> ```
> >>>>>
> >>>>> Data can be downloaded here:
> >>>>>
> >>>>> https://s3.amazonaws.com/vgonzalez/data/tstamp_test.tar.gz
> >>>>
> >>>>
> >>
> >
>
>


-- 
 Steven Phillips
 Software Engineer

 mapr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message