drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: Bug or Feature?
Date Thu, 04 Feb 2016 17:53:32 GMT
Yeah, not ideal. We should get a JIRA up and fix this.

Since I've seen the code, it isn't surprising either. An easier way to
understand this behavior is run the query select dir0 from t limit 1 (where
t is one directory versus two). In the single case, you'll see that dir0 is
null. (Thus is why the count returns zero records.)

I believe that the dirX code currently relies on the shared base. This
means that it will work even in the case of using globbing (a fairly
complicated case in how it interacts with dirX). However, it means that it
will fail in this situation to behave the way you would expect. You could
see a similarly unexpected behavior if you had one first-level level and
two subdirectories within that first level. I agree that it is an issue and
we should probably handle this as a special case.

Can you file a jira with a couple examples that behave differently than you
expected?



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Feb 4, 2016 at 8:21 AM, John Omernik <john@omernik.com> wrote:

> Prior to posting a JIRA, I thought I'd toss this here:
>
> If I have a directory: data with subdirectories with parquet files in it
>
>
> data/2016-01-01
> data/2016-01-02
>
> (Seem familiar? This came up in my other testing)
>
>
> If I have MORE then one subdirectory,
>
> then
>
> select count(1) from `data/` where dir0='2016-01-01'
>
>  Works fine.
>
> However, if I have EXACTLY one subdirectory, then
>
> select count(1) from `data/` where dir0 = '2016-01-01'
>
> Takes 15 seconds (instead of returning almost instantly) and reports 0
> records for count.
> Note, this directory DOES exists, so that is not the issue.
>
> If I add a second directory, then the exact query returns almost instantly,
> and reports the correct number of records.
>
> In addition, when there is only one directory, select count(1) from `data/`
> returns instant and the correct count.
>
> To me, it appears if there is ONE and only ONE subdirectory, then dir0=
>  doesn't work as I think people would expect it to. I can't think of a real
> reason to have this behave, and to me it violates the principle of "least
> surprise", but I am not up on the internals of Drill, so I thought I'd post
> here first.
>
> John
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message