drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: GitHub raw data as a Data source
Date Wed, 29 Jul 2020 22:42:53 GMT
Paul, 
Actually, there was a PR which added some code so that you can read CSVs from APIs, so no
coding necessary!   You have to be using the latest build of Drill 1.18-SNAPSHOT. 
Follow the instructions here to set up the HTTP plugin: [1]

The key parameter is the `inputType` parameter which tells Drill to expect JSON or CSV from
the API.  The default is JSON.   The config below is an example configuration to do exactly
what you're describing.  

 "met": {
      "url": "https://media.githubusercontent.com/media/metmuseum/openaccess/master/MetObjects.csv",
      "method": "GET",
      "headers": null,
      "authType": "none",
      "userName": null,
      "password": null,
      "postBody": null,
      "params": null,
      "dataPath": null,
      "requireTail": false,
      "inputType": "csv"
    }

Good luck!
-- C


[1]: https://github.com/apache/drill/tree/master/contrib/storage-http


> On Jul 29, 2020, at 6:29 PM, Paul Rogers <par0328@gmail.com> wrote:
> 
> Hi Faraz,
> 
> The short answer is, "yes, but you have to write some code." Drill can
> process any tabular data, but needs a reader (a "storage plugin") to
> convert from the API's data format to Drill's value vector format. The good
> news is that, for most formats, readers already exist. Your file appears to
> be CSV: Drill provides a CSV reader. What Drill does not provide is a
> storage plugin to read CSV from a REST call. It should be easy to create
> one: just start with (or better, modify) the REST storage plugin. Instead
> of creating a JSON decoder for the data, create a CSV decoder.
> 
> If you choose to go this route, we can give you pointers for how to
> proceed. Alternatively, you can use a script to download the data to a
> local file, then use the existing CSV reader to query the data. Not
> elegant, but may be fine if you do the query infrequently.
> 
> - Paul
> 
> 
> On Wed, Jul 29, 2020 at 1:06 PM Faraz Ahmad <faraz.ahmad@outlook.com> wrote:
> 
>> Hi Team,
>> 
>> 
>> 
>> Is there any way we can able to query csv file data from GitHub using
>> Apache Drill?
>> 
>> 
>> 
>> Currently, I can able to pull this GitHub data into Power BI by using Web
>> data connection with below URL:
>> 
>> 
>> 
>> 
>> https://raw.githubusercontent.com/itsnotaboutthecell/Power-BI-Sessions/master/An%20Introduction%20to%20Tabular%20Editor/Source%20Files/Customers.csv
>> 
>> 
>> 
>> 
>> 
>> My goal is to pull this data outside of Power BI, mash up with other data
>> and then simply create a view within Drill.
>> 
>> This view will then be connected to Power BI thru Drill ODBC connection.
>> 
>> 
>> 
>> Kindly let me know if this is possible. Thanks so much!
>> 
>> 
>> 
>> 
>> 
>> Regards,
>> 
>> Faraz Ahmad
>> 
>> 
>> 


Mime
View raw message