drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Bug or Feature?
Date Tue, 09 Feb 2016 14:06:01 GMT
I have filed a JIRA on this issue.  I do see this as important as users
shouldn't expect a different behavior in this case.

Thanks!

https://issues.apache.org/jira/browse/DRILL-4379

On Thu, Feb 4, 2016 at 11:53 AM, Jacques Nadeau <jacques@dremio.com> wrote:

> Yeah, not ideal. We should get a JIRA up and fix this.
>
> Since I've seen the code, it isn't surprising either. An easier way to
> understand this behavior is run the query select dir0 from t limit 1 (where
> t is one directory versus two). In the single case, you'll see that dir0 is
> null. (Thus is why the count returns zero records.)
>
> I believe that the dirX code currently relies on the shared base. This
> means that it will work even in the case of using globbing (a fairly
> complicated case in how it interacts with dirX). However, it means that it
> will fail in this situation to behave the way you would expect. You could
> see a similarly unexpected behavior if you had one first-level level and
> two subdirectories within that first level. I agree that it is an issue and
> we should probably handle this as a special case.
>
> Can you file a jira with a couple examples that behave differently than you
> expected?
>
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Thu, Feb 4, 2016 at 8:21 AM, John Omernik <john@omernik.com> wrote:
>
> > Prior to posting a JIRA, I thought I'd toss this here:
> >
> > If I have a directory: data with subdirectories with parquet files in it
> >
> >
> > data/2016-01-01
> > data/2016-01-02
> >
> > (Seem familiar? This came up in my other testing)
> >
> >
> > If I have MORE then one subdirectory,
> >
> > then
> >
> > select count(1) from `data/` where dir0='2016-01-01'
> >
> >  Works fine.
> >
> > However, if I have EXACTLY one subdirectory, then
> >
> > select count(1) from `data/` where dir0 = '2016-01-01'
> >
> > Takes 15 seconds (instead of returning almost instantly) and reports 0
> > records for count.
> > Note, this directory DOES exists, so that is not the issue.
> >
> > If I add a second directory, then the exact query returns almost
> instantly,
> > and reports the correct number of records.
> >
> > In addition, when there is only one directory, select count(1) from
> `data/`
> > returns instant and the correct count.
> >
> > To me, it appears if there is ONE and only ONE subdirectory, then dir0=
> >  doesn't work as I think people would expect it to. I can't think of a
> real
> > reason to have this behave, and to me it violates the principle of "least
> > surprise", but I am not up on the internals of Drill, so I thought I'd
> post
> > here first.
> >
> > John
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message