drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@gmail.com>
Subject Re: GitHub raw data as a Data source
Date Wed, 29 Jul 2020 22:29:08 GMT
Hi Faraz,

The short answer is, "yes, but you have to write some code." Drill can
process any tabular data, but needs a reader (a "storage plugin") to
convert from the API's data format to Drill's value vector format. The good
news is that, for most formats, readers already exist. Your file appears to
be CSV: Drill provides a CSV reader. What Drill does not provide is a
storage plugin to read CSV from a REST call. It should be easy to create
one: just start with (or better, modify) the REST storage plugin. Instead
of creating a JSON decoder for the data, create a CSV decoder.

If you choose to go this route, we can give you pointers for how to
proceed. Alternatively, you can use a script to download the data to a
local file, then use the existing CSV reader to query the data. Not
elegant, but may be fine if you do the query infrequently.

- Paul


On Wed, Jul 29, 2020 at 1:06 PM Faraz Ahmad <faraz.ahmad@outlook.com> wrote:

> Hi Team,
>
>
>
> Is there any way we can able to query csv file data from GitHub using
> Apache Drill?
>
>
>
> Currently, I can able to pull this GitHub data into Power BI by using Web
> data connection with below URL:
>
>
>
>
> https://raw.githubusercontent.com/itsnotaboutthecell/Power-BI-Sessions/master/An%20Introduction%20to%20Tabular%20Editor/Source%20Files/Customers.csv
>
>
>
>
>
> My goal is to pull this data outside of Power BI, mash up with other data
> and then simply create a view within Drill.
>
> This view will then be connected to Power BI thru Drill ODBC connection.
>
>
>
> Kindly let me know if this is possible. Thanks so much!
>
>
>
>
>
> Regards,
>
> Faraz Ahmad
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message