drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: REST data source?
Date Tue, 31 Mar 2020 21:01:30 GMT
Hi Rafael,

You mention that your JSON response is nested. As it turns out, I just used something similar
to Charle's HTTP plugin for a recent project. We had to deal with a bit of message overhead
to get to the data:

{status: "ok", data: [your data here ]}

A PR was just submitted for a change to the "new" JSON parser to handle this case. However,
the "message parser" does require code to parse its way down through the JSON.

The next step is to upgrade Charle's PR with the new JSON reader and support for the message
parser. (The new JSON reader also allows you to specify a schema to handle messy JSON, if
we could figure out where to store the schema.)

Can you perhaps share the JSON response structure you need? I'm trying to figure out if it
is better to work out some kind of text description of the parse path, or just let you specify
the name of a class that implements the message parser. Which would work better for you?

We are also trying to update an earlier ill-fated PR that adds filter push-down: the ability
to convert a SQL WHERE expression into an HTTP parameter. That is WHERE foo = 'bar' becomes
&foo=bar in the URL. It is easy to implement the "naive" approach that handles only equality,
and does a direct mapping to HTTP query params. Would this be useful in your case? Do you
need to parameterize your HTTP request?

Any real-world insight would be helpful.

- Paul


    On Tuesday, March 31, 2020, 1:40:17 PM PDT, Jaimes, Rafael - 0993 - MITLL <rafael.jaimes@ll.mit.edu>
 Ok, I commented in that thread.
I think the proxy is the only missing piece. I tried connecting to a different service that
is inside the proxy and it worked as expected. This looks like it will work well for our application.

FYI, Although it has basic auth, I am not using the authType field in the storage config.
Rather, our service authenticates from the header in this format: {"Authentication": "Basic

The response JSON is nested quite a bit but I think it can be fixed by modifying the SELECT
as you have done in your examples.


-----Original Message-----
From: Charles Givre <cgivre@gmail.com> 
Sent: Tuesday, March 31, 2020 3:27 PM
To: user@drill.apache.org
Subject: Re: REST data source?

At the moment the plugin does not support proxy servers.  However, this is pretty easy to
implement using the current libraries.  Could you please add a comment to the PR for the
plugin (https://github.com/apache/drill/pull/1892 <https://github.com/apache/drill/pull/1892>)
with some explanation of what you need?
-- C

> On Mar 31, 2020, at 3:21 PM, Jaimes, Rafael - 0993 - MITLL <Rafael.Jaimes@ll.mit.edu>
> Hi Paul,
> I tried that (even tried a vanilla build before on its own) and I run into the same dependency
problem. There is something in apache-21.pom that I cannot resolve. If it works for you I
am certain it is a config on our end due to the way our proxies and mirrors are setup, we
have to go through these internal channels when building and it sometimes causes issues.
> Charles,
> I am almost up and running with your pre-built instance. I have narrowed the problem
down to possibly being another proxy issue. The GET requests don't seem to be honoring my
system env variable proxy settings. Do you think there's any way to force Drill/plug-in to
use a proxy? I'm unable to get the examples you have posted working: getting Connection reset
error on HTTPS and Connect time out with HTTP.  The URLs work fine if I test them outside
of Drill.
> Thanks,
> Rafael
> -----Original Message-----
> From: Paul Rogers <par0328@yahoo.com.INVALID>
> Sent: Tuesday, March 31, 2020 2:36 PM
> To: user@drill.apache.org
> Subject: Re: REST data source?
> Hi Rafael,
> The easiest way to build the plugin will be to build all of Drill 1.18 Snapshot with
the plugin included.
> 1. Grab master from GitHub.
> 2. Merge in Charle's PR branch.
> 3. mvn clean install -DskipTests
> The above usually works for me. This process ensures that all the snapshot versions come
from your own build.
> Not sure how we started storing snapshot versions in a Maven repo. 
> This causes issues. If you rebuild part of Drill, and have not built 
> the other parts in more than a day, Maven helpfully downloads the 
> snapshots from the repo, causing all kinds of chaos. (We should fix 
> this.)
> Once you do the build, you'll have a full Drill distribution, just like you'd download.
You can use that distribution to run Drill with the plugin included.
> There are other ways that also work; the above may be the simplest.
> Thanks,
> - Paul
>    On Tuesday, March 31, 2020, 10:51:18 AM PDT, Jaimes, Rafael - 0993 - MITLL <rafael.jaimes@ll.mit.edu>
> Hi Charles,
> (1./2.)
> I have not been able to build Drill, from either a full clone of your tagged http-storage
branch or from the standard Drill 1.17 release. 
> I've narrowed it down to some dependency problems from the POM. In particular, I run
into issues here:
> Downloading: https://repo.maven.apache.org/maven2/org/apache/apache/21/apache-21.pom
> [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR]  The project
org.apache.drill:drill-root:1.18.0-SNAPSHOT (/home/ra29435/drill-official/drill/pom.xml) has
1 error [ERROR]    Non-resolvable parent POM: Could not transfer artifact org.apache:apache:pom:21
from/to conjars (http://conjars.org/repo): Connection to http://conjars.org refurelativePath'
points at no local POM @ line 24, column 11: Connection timed out (Connection timed out) ->
[Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the
-e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] For more information about the errors and possible solutions, please read the
following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingExcept
> ion [ERROR] [Help 2] 
> http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelExce
> ption
> I think it has something to do with the fact that I normally resolve dependencies from
our local Maven repo mirrors. We have no problems getting stuff from Maven Central and common
places, but I am unfamiliar with conjars.org. I wonder if it is related to that?
> (3./4.)
> I tried putting the JAR into either jars/ or jars/3rdparty with the same error. I haven't
gone down the dependency tree so I have not made and JARs of them, that could be a major thing
I'm missing.
> Yes this is still in a testing environment. I'm going to use your pre-built images for
testing the REST endpoint, this is extremely helpful. If it works out I'll go back to trying
to build it. Also, hoping that this will make its way into the next (1.18) release.
> Best,
> Rafael
> -----Original Message-----
> From: Charles Givre <cgivre@gmail.com>
> Sent: Tuesday, March 31, 2020 1:34 PM
> To: user <user@drill.apache.org>
> Subject: Re: REST data source?
> Hi Rafael,
> Glad you're getting some value from Drill.  Repackaging that directory as a truly pluggable
jar is tricky.  A few questions:
> 1.  Did you copy the contrib/storage-http into its own folder and then do a build from
> 2.  Did it build successfully?
> 3.  Did you copy the JARs into your Drill jars/3rdparty folder?
> 4.  You'll also have to get JARs of any dependencies as well and copy them to the jars/3rdparty. 
Have you done that?
> I actually have a pre-built version of Drill with the storage-http plugin available here:
https://github.com/cgivre/drill/releases <https://github.com/cgivre/drill/releases>. 
Please do not use that in any kind of production setup.  If you're just wanting to try this
out, it might be easier to d/l that and use that.
> -- C
>> On Mar 31, 2020, at 12:57 PM, Jaimes, Rafael - 0993 - MITLL <Rafael.Jaimes@ll.mit.edu>
>> Hi Charles,
>> I am trying to use the http-storage plugin from your branch. I put the storage plug-in
files in a jar and tried to keep the jar directory structure the same as other plug-ins. Upon
starting drill-embedded I’m getting the error below.  I am using your drill-module.conf
and bootstrap-storage-plugins.json from your branch. Is there another step I need to perform
to get Drill to recognize the plug-in? I am using 1.17 release.
>> Error: Failure in starting embedded Drillbit: 
>> java.lang.IllegalStateException: 
>> com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Could not 
>> resolve type id 'http' as a subtype of [simple type, class
>> org.apache.drill.common.logical.StoragePluginConfig]: known type ids 
>> = [InfoSchemaConfig, SystemTablePluginConfig, file, hbase, hive, 
>> jdbc, kafka, kudu, mock, mongo, named, openTSDB] (for POJO property
>> 'storage') at [Source: (String)"{
>> "storage":{
>>  "http" : {
>>    "type":"http",
>>    "connections": {},
>>    "enabled": false
>>  }
>> }
>> }
>> "; line: 4, column: 14] (through reference chain: 
>> org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.
>> util.LinkedHashMap["http"]) (state=,code=0)
>> Paul,
>> I don’t know much about this REST service quite yet (it is internal).  We utilize
REST API where all responses are returned as JSON formatted strings in many places, I don’t
think it is very sophisticated. I am not sure how it will handle projection and filter issues.
My current pipeline involves using python requests.get() and then unpacking the response string.
It does have an authentication layer, so I am mildly concerned that the HTTP-storage-plugin
will have a hiccup – although it looks like it can use “Basic”. If I can get Drill to
query the endpoint I will report back if I find anything else that might be useful to you.
>> Thanks both for your great work with Drill!
>> -          Rafael
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message