drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Looking for advice on integrating with a custom data source
Date Sat, 11 Jan 2020 23:20:18 GMT
Hi Andy,

There are likely multiple approaches; here are two. Some bit of code has to decide what can
be pushed to your data source and what must remain in Drill. At present, there is no declarative
way to say, "OK to push such-and-so expression, but keep this-and-that."

Instead, the current approach is for your plugin to tie into Drill's Calcite-based query planner.
You define Calcite rules that fire to perform the push operations you want to support. The
code in this area is somewhat obscure, but multiple examples exist in the Kafka and other
plugins.

Also, at present, storage "plugins" are not really plugins at compile time: they pretty much
need to be built within the Drill source tree. This is especially true to run unit tests.
(We'd like to improve this area of the project; suggestions welcome.) Generally, folks put
their plugin in the "contrib" directory within Drill. Yes, you must maintain your own branch.
However, as long as you do not modify Drill code (you shouldn't need to), it is not too hard
to simply occasionally rebase your branch on top of a new Drill release.

At runtime, however, plugins are true plugins: you can take the plugin jar you create using
the above process and drop it into an "official" release directory. We talk a bit about this
process in the book Learning Apache Drill from O'Reilly.


We recently tried to clean up the plugin structure just a bit in PR 1914 (DRILL-7458) [1].
The PR provides just a few baby steps and suggestions are encouraged. The key new feature
in this PR is an standardized way to handle filter push-downs to avoid the large amount of
copy-and-paste previously required.


The PR is the result of a recent project to create a storage plugin that included filter push-down.
Notes on that process are in [2].

You mentioned that your data source is similar to JDBC. So, another approach is to modify
the existing storage plugin to provide storage plugin config options to control what gets
pushed down (assuming that the decision is simple enough to express as a few options.) In
this case, you could offer your changes as a PR which the Drill project would maintain as
part of the source base, saving you from creating your own fork.

Thanks,
- Paul


[1] https://github.com/apache/drill/pull/1914
 
[2] https://github.com/paul-rogers/drill/wiki/Create-a-Storage-Plugin



    On Saturday, January 11, 2020, 2:58:08 PM PST, Andy Grove <andygrove73@gmail.com>
wrote:  
 
 Hi,

I'd like to use Apache Drill with a custom data source that supports a
subset of SQL.

My goal is to have Drill push selection and predicates down to my data
source but the rest of the query processing should take place in Drill.

I started out by writing a JDBC driver for the data source and registering
that with Drill using the Jdbc Storage Plugin but it seems to just pass the
whole query through to my data source, so that approach isn't going to work
unless I'm missing something?

Is there any way to configure the JDBC storage plugin to only push certain
parts of the query to the data source?

If this isn't a good approach, do I need to write a custom storage plugin?
Can these be added on the classpath or would that require me maintaining a
fork of the project?

I appreciate any pointers anyone can give me.

Thanks,

Andy.
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message