drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Apache Drill rest api plugin
Date Mon, 23 Mar 2020 18:40:52 GMT
Hi Navin,

Can you share a bit more what you are trying to do? ECS is Elastic Container Service, correct?
So, the Parquet files are ephemeral: they exist only while the container runs? Do the files
have a permanent form, such as in S3?

Parquet is a complex format. Drill exploits the Parquet structure to optimize query performance.
This means that Drill must seek the the header, footer and row groups of each file. More specifically,
Parquet cannot be read in a streaming fashion the way we can read CSV or JSON.

The best REST API for Parquet would be a clone of the Amazon S3 API. Alternatively, expose
the files using something like NFS so that the file on ECS appears like a local file to Drill.

You can even implement the HDFS client API on top of your REST API (assuming your REST API
supports the required functions), and use Drill's DFS plugin with your client.

Yet another alternative is to store Parquet in S3, so Drill can use the S3 API directly. Or,
to stream the content to Drill from a container, use JSON or CSV.

Lots of options that depend on what you're trying to do.


- Paul


    On Monday, March 23, 2020, 6:03:48 AM PDT, Charles Givre <cgivre@gmail.com> wrote:
 Hi Navin, 
Thanks for your interest in Drill.  To answer your question, there is currently a pull request
for a REST storage plugin [1], however as implemented it only accepts JSON responses.  However,
it would not be difficult to get the reader to accept Parquet files.  Please take a look
and send any feedback.
-- C

[1]: https://github.com/apache/drill/pull/1892 <https://github.com/apache/drill/pull/1892>

> On Mar 23, 2020, at 8:14 AM, Navin Bhawsar <navin.bhawsar@gmail.com> wrote:
> Hi
> We are currently doing an experiment to use apache drill to query parquet
> files .These parquet files will be copied on ecs and exposed via rest api .
> Can you please advise if there is a storage plugin to query rest api ?
> Currently we are using Apache Drill 1.17 version in distributed mode .
> Please let me know if you need more details .
> Thanks and Regards,
> Navin
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message