drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: REST data source?
Date Tue, 31 Mar 2020 20:39:05 GMT
Hi Rafael,

You may be running into something that I hit at a recent employer. The firm hosted its own
in-house artifactory that would pull only from "authorized" repos. Drill has a couple of dependencies
on MapR-hosted repos which this firm did not mirror, causing Drill to break. Rather than argue
with the Powers That Be to change the rules for my little POC, I found a work-around. If you
are having the same issue, this might work for you. My notes from that time are at [1]. Of
course, your issue could be different, so we might need a different solution. As I recall,
the error I got was a bit different than the one you got. Still, worth a try.

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Build-Drill-in-a-Corporate-Environment


    On Tuesday, March 31, 2020, 12:21:42 PM PDT, Jaimes, Rafael - 0993 - MITLL <rafael.jaimes@ll.mit.edu>
 Hi Paul,

I tried that (even tried a vanilla build before on its own) and I run into the same dependency
problem. There is something in apache-21.pom that I cannot resolve. If it works for you I
am certain it is a config on our end due to the way our proxies and mirrors are setup, we
have to go through these internal channels when building and it sometimes causes issues.


I am almost up and running with your pre-built instance. I have narrowed the problem down
to possibly being another proxy issue. The GET requests don't seem to be honoring my system
env variable proxy settings. Do you think there's any way to force Drill/plug-in to use a
proxy? I'm unable to get the examples you have posted working: getting Connection reset error
on HTTPS and Connect time out with HTTP.  The URLs work fine if I test them outside of Drill.


-----Original Message-----
From: Paul Rogers <par0328@yahoo.com.INVALID> 
Sent: Tuesday, March 31, 2020 2:36 PM
To: user@drill.apache.org
Subject: Re: REST data source?

Hi Rafael,

The easiest way to build the plugin will be to build all of Drill 1.18 Snapshot with the plugin

1. Grab master from GitHub.

2. Merge in Charle's PR branch.

3. mvn clean install -DskipTests

The above usually works for me. This process ensures that all the snapshot versions come from
your own build.

Not sure how we started storing snapshot versions in a Maven repo. This causes issues. If
you rebuild part of Drill, and have not built the other parts in more than a day, Maven helpfully
downloads the snapshots from the repo, causing all kinds of chaos. (We should fix this.)

Once you do the build, you'll have a full Drill distribution, just like you'd download. You
can use that distribution to run Drill with the plugin included.

There are other ways that also work; the above may be the simplest.

- Paul


    On Tuesday, March 31, 2020, 10:51:18 AM PDT, Jaimes, Rafael - 0993 - MITLL <rafael.jaimes@ll.mit.edu>
 Hi Charles,

I have not been able to build Drill, from either a full clone of your tagged http-storage
branch or from the standard Drill 1.17 release. 
I've narrowed it down to some dependency problems from the POM. In particular, I run into
issues here:

Downloading: https://repo.maven.apache.org/maven2/org/apache/apache/21/apache-21.pom
[ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR]  The project org.apache.drill:drill-root:1.18.0-SNAPSHOT
(/home/ra29435/drill-official/drill/pom.xml) has 1 error [ERROR]    Non-resolvable parent
POM: Could not transfer artifact org.apache:apache:pom:21 from/to conjars (http://conjars.org/repo):
Connection to http://conjars.org refurelativePath' points at no local POM @ line 24, column
11: Connection timed out (Connection timed out) -> [Help 2] [ERROR] [ERROR] To see the
full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] For more information about the errors and possible solutions, please read the following
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException

I think it has something to do with the fact that I normally resolve dependencies from our
local Maven repo mirrors. We have no problems getting stuff from Maven Central and common
places, but I am unfamiliar with conjars.org. I wonder if it is related to that?

I tried putting the JAR into either jars/ or jars/3rdparty with the same error. I haven't
gone down the dependency tree so I have not made and JARs of them, that could be a major thing
I'm missing.

Yes this is still in a testing environment. I'm going to use your pre-built images for testing
the REST endpoint, this is extremely helpful. If it works out I'll go back to trying to build
it. Also, hoping that this will make its way into the next (1.18) release.


-----Original Message-----
From: Charles Givre <cgivre@gmail.com>
Sent: Tuesday, March 31, 2020 1:34 PM
To: user <user@drill.apache.org>
Subject: Re: REST data source?

Hi Rafael,
Glad you're getting some value from Drill.  Repackaging that directory as a truly pluggable
jar is tricky.  A few questions:
1.  Did you copy the contrib/storage-http into its own folder and then do a build from that?
2.  Did it build successfully?
3.  Did you copy the JARs into your Drill jars/3rdparty folder?
4.  You'll also have to get JARs of any dependencies as well and copy them to the jars/3rdparty. 
Have you done that?

I actually have a pre-built version of Drill with the storage-http plugin available here:
https://github.com/cgivre/drill/releases <https://github.com/cgivre/drill/releases>. 
Please do not use that in any kind of production setup.  If you're just wanting to try this
out, it might be easier to d/l that and use that.
-- C

> On Mar 31, 2020, at 12:57 PM, Jaimes, Rafael - 0993 - MITLL <Rafael.Jaimes@ll.mit.edu>
> Hi Charles,
> I am trying to use the http-storage plugin from your branch. I put the storage plug-in
files in a jar and tried to keep the jar directory structure the same as other plug-ins. Upon
starting drill-embedded I’m getting the error below.  I am using your drill-module.conf
and bootstrap-storage-plugins.json from your branch. Is there another step I need to perform
to get Drill to recognize the plug-in? I am using 1.17 release.
> Error: Failure in starting embedded Drillbit: 
> java.lang.IllegalStateException: 
>com.fasterxml.jackson.databind.exc.InvalidTypeIdException: Could not 
>resolve type id 'http' as a subtype of [simple type, class 
>org.apache.drill.common.logical.StoragePluginConfig]: known type ids = 
>[InfoSchemaConfig, SystemTablePluginConfig, file, hbase, hive, jdbc, 
>kafka, kudu, mock, mongo, named, openTSDB] (for POJO property 
>'storage') at [Source: (String)"{
>  "storage":{
>    "http" : {
>      "type":"http",
>      "connections": {},
>      "enabled": false
>    }
>  }
> }
> "; line: 4, column: 14] (through reference chain: 
> org.apache.drill.exec.planner.logical.StoragePlugins["storage"]->java.
> util.LinkedHashMap["http"]) (state=,code=0)
> Paul,
> I don’t know much about this REST service quite yet (it is internal).  We utilize
REST API where all responses are returned as JSON formatted strings in many places, I don’t
think it is very sophisticated. I am not sure how it will handle projection and filter issues.
My current pipeline involves using python requests.get() and then unpacking the response string.
It does have an authentication layer, so I am mildly concerned that the HTTP-storage-plugin
will have a hiccup – although it looks like it can use “Basic”. If I can get Drill to
query the endpoint I will report back if I find anything else that might be useful to you.
> Thanks both for your great work with Drill!
> -          Rafael
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message