drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Accessing Avro on Parquet Using Drill
Date Fri, 05 Sep 2014 17:33:48 GMT
This should be possible just reading the file off of the filesystem.
Parquet is standardized on disk, so regardless of what produced the file,
we can read the data into Drill. There is extra space in the footer of the
parquet file for meta-data custom to the producer of the file. I have not
looked into Avro specifically, but it is possible that they are including
meta-data about how to re-populate the data into the in-memory Avro
representation. We can get all of the same data out of the file, but Avro
might interpret it differently.

I would try reading the file straight from the distributed filesystem. If
the data is returning how you expect it to then you can query it
immediately. If not we can work with you to generate UDFs to make it work
like the in memory Avro structure.

-Jason


On Fri, Sep 5, 2014 at 1:47 AM, mufy <mufeed.usman@gmail.com> wrote:

> Here is what I learned - we allow Drill to access Avro using Hive as a
> middle-man, i.e, creating a Hive table for Avro files. For now, no direct
> access to Avro files using Drill. Only Parquet, JSON and Text (as of
> version 0.4).
>
> With that in the picture can the following be achieved?
>
> => Does Drill have the capability to "drill" Parquet if an Avro structure
> is hosted on a Parquet storage?
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message