drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: Broader feedback on DRILL-3810
Date Thu, 29 Oct 2015 09:23:39 GMT

I have provided the below review comment for Avro implementation, but I
think it is common for all schema based files, so just want to ask these
questions in this mailing chain.

1) This approach only works, if input data satisfies below points. So are
we going to impose the below conditions for all schema based FormatPlugin's?
    i. If the input directory is a leaf directory, then all the files in it
should have the same schema
    ii. If the input directory contains directories, then all the files in
sub-directories should have same schema.

2) What if directory has different files with different schemas? then it
will break. How do we handle this scenario?

Thanks & Regards,
B Anil Kumar.

On Thu, Oct 29, 2015 at 1:23 AM, Jacques Nadeau <jacques@dremio.com> wrote:

> Hey Guys,
> DRILL-3810 is a patch adding schema to a format plugin. In order to do
> this, Kamesh has suggested a change to the FormatPlugin that basically has
> a secondary call called getDrillTable(Object selection) that is called
> after the FormatMatcher. However, it seems weird that there is a
> multi-stage interaction here between the engine and a format plugin. One
> idea I had is that the FormatMatcher should return the Table object
> directly (and thus have the ability to return a schema'd pattern). Kamesh's
> most recent patch presents this approach. I wanted to get some more
> feedback from others on this issue before we finalize a particular
> direction since this should ultimately be a stable external API.
> What do others think? For reference, it is my expectation is that CSV and
> Parquet should ultimately also implement this interface.)
> https://issues.apache.org/jira/browse/DRILL-3810
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message