drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Use cases for DFDL + Drill
Date Sun, 03 Nov 2019 14:34:34 GMT


> Begin forwarded message:
> 
> From: Charles Givre <cgivre@gmail.com>
> Subject: Re: Use cases for DFDL
> Date: November 3, 2019 at 9:31:17 AM EST
> To: Julian Feinauer <j.feinauer@pragmaticminds.de>
> Cc: "users@daffodil.apache.org" <users@daffodil.apache.org>, "Costello, Roger L."
<costello@mitre.org>, users@drill.apache.org, dev@drill.apache.org
> 
> Hi Julian, 
> It seems like there is a beginning of convergence of the minds here.  I went to the Apache
Roadshow in DC and that was where I learned about DFDL and immediately thought this was a
really interesting possibility.
> 
> I'd love to see if we could foster some collaboration between the various projects on
this.  From the Drill side of things, it would make it SO much easier to get Drill to read
(and by extension query) various data types.  I'd be willing to contribute time from the Drill
side, but I definitely will need help understanding how DFDL works.   
> 
> --C
> 
> 
> 
>> On Nov 3, 2019, at 8:01 AM, Julian Feinauer <j.feinauer@pragmaticminds.de <mailto:j.feinauer@pragmaticminds.de>>
wrote:
>> 
>> Hi Charles,
>>  
>> this is an interesting idea and in fact we also discussed the same matter for Calcite
at ApacheCon NA.
>> But, I agree that it would be really powerful together with a complete Runtime like
Drill.
>>  
>> Julian
>>  
>>  
>> Von: Charles Givre <cgivre@gmail.com <mailto:cgivre@gmail.com>>
>> Antworten an: "users@daffodil.apache.org <mailto:users@daffodil.apache.org>"
<users@daffodil.apache.org <mailto:users@daffodil.apache.org>>
>> Datum: Mittwoch, 30. Oktober 2019 um 19:38
>> An: "Costello, Roger L." <costello@mitre.org <mailto:costello@mitre.org>>
>> Cc: "users@daffodil.apache.org <mailto:users@daffodil.apache.org>" <users@daffodil.apache.org
<mailto:users@daffodil.apache.org>>
>> Betreff: Re: Use cases for DFDL
>>  
>> +1
>> 
>> 
>>> On Oct 30, 2019, at 2:36 PM, Costello, Roger L. <costello@mitre.org <mailto:costello@mitre.org>>
wrote:
>>>  
>>> Excellent! Okay, here’s the use case:
>>>  
>>> A Daffodil extension could be created for Apache Drill so that you could parse
any kind of data with Daffodil using a DFDL schema, and then you could use ANSI SQL to query
the data, join it with other data, do analysis, etc., just as if it came from a database.
So, instead of parsing data to XML and then using XPath to pull out data, you could instead
parse data to Apache Drill's data representation and then use ANSI SQL to pull out data, and
even combine it with other non-Daffodil data types. The advantage for this would be that it
would make it very easy to enable Drill to query new data types (IE simply by using a DFDL
schema) and it would enable users to easily query this data without having to load it into
another system.
>>>  
>>> How’s that Charles?
>>>  
>>> /Roger
>>> From: Charles Givre <cgivre@gmail.com <mailto:cgivre@gmail.com>>

>>> Sent: Wednesday, October 30, 2019 2:28 PM
>>> To: Costello, Roger L. <costello@mitre.org <mailto:costello@mitre.org>>
>>> Cc: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
>>> Subject: [EXT] Re: Use cases for DFDL
>>>  
>>> Close... One minor nit is that Drill doesn't use a "query-like" syntax. It is
regular ANSI SQL.  IMHO, I think this. would be a really great collaboration of the two communities.
>>> --C
>>>  
>>> 
>>> 
>>> 
>>>> On Oct 30, 2019, at 1:10 PM, Costello, Roger L. <costello@mitre.org <mailto:costello@mitre.org>>
wrote:
>>>>  
>>>> Thanks again Charles. Is the following use case description correct?
>>>>  
>>>> A Daffodil extension could be created for Apache Drill so that you could
parse any kind of data with Daffodil using a DFDL schema, and then you could use Apache Drill's
query-like syntax and rich capabilities to query parts of that data, join it with other data,
do analysis, etc., just as if it came from a database. So, instead of parsing data to XML
and then using XPath to pull out data, you could instead parse data to Apache Drill's data
representation and then use Drills rich data-query capabilities to pull out data, and even
combine it with other non-Daffodil data types. The advantage for this would be that it would
make it very easy to enable Drill to query new data types (IE simply by using a DFDL schema)
and it would enable users to easily query this data without having to load it into another
system.
>>>>  
>>>> Is that correct?
>>>>  
>>>> /Roger
>>>> From: Charles Givre <cgivre@gmail.com <mailto:cgivre@gmail.com>>

>>>> Sent: Wednesday, October 30, 2019 12:19 PM
>>>> To: Costello, Roger L. <costello@mitre.org <mailto:costello@mitre.org>>
>>>> Cc: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>  
>>>> Not exactly...
>>>> I was thinking of using DFDL to enable Drill to create a schema for data
that Drill cannot read.  If DFDL can be used to describe the schema, a plugin could be written
for Drill that mirrors this schema and ultimately reads the data files.  Drill wouldn't be
populating any database, but rather directly querying the data.
>>>>  
>>>> The advantage for this would be that it would make it very easy to enable
Drill to query new data types (IE simply by using a DFDL schema) and it would enable users
to easily query this data w/o having to load it into another system.  Does that make sense?
>>>> -- C
>>>>  
>>>>  
>>>>> On Oct 30, 2019, at 12:13 PM, Costello, Roger L. <costello@mitre.org
<mailto:costello@mitre.org>> wrote:
>>>>>  
>>>>> Thanks Charles. Let me see if I understand the use case correctly.
>>>>>  
>>>>> Use DFDL to parse data to populate a database and then use Apache Drill
to query the database.
>>>>>  
>>>>> Is that correct?
>>>>>  
>>>>> /Roger 
>>>>>  
>>>>> From: Charles Givre <cgivre@gmail.com <mailto:cgivre@gmail.com>>

>>>>> Sent: Wednesday, October 30, 2019 12:01 PM
>>>>> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
>>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>>  
>>>>> To add to this discussion, I'm the PMC chair for Apache Drill.  I think
a compelling use case for DFDL would be enabling Drill to use DFDL to enable Drill to query
data based on a DFDL schema.  This same concept could be applied to other SQL query engines
such as Presto and/or Impala. 
>>>>>  
>>>>> IMHO, this would facilitate the analysis of data sets supported by DFDL.

>>>>> -- C
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Oct 30, 2019, at 11:53 AM, Costello, Roger L. <costello@mitre.org
<mailto:costello@mitre.org>> wrote:
>>>>>>  
>>>>>> Thanks Mike! I updated the slide:
>>>>>>  
>>>>>> <image002.png>
>>>>>>  
>>>>>> From: Beckerle, Mike <mbeckerle@tresys.com <mailto:mbeckerle@tresys.com>>

>>>>>> Sent: Wednesday, October 30, 2019 11:45 AM
>>>>>> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
>>>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>>>  
>>>>>> I would not pick on RDF data stores as the target.
>>>>>>  
>>>>>> Parsing data to populate a database (any variety) is the actual case.
The fact that we did do one project involving RDF is why I cited that example in particular
but pulling data into any data store/data base begins with the ability to parse the data,
and then process it into suitable form.
>>>>>>  
>>>>>> This is an incomplete list so perhaps this slide title should be
"Example Use Cases for DFDL" ?
>>>>>>  
>>>>>> ...mikeb
>>>>>> From: Costello, Roger L. <costello@mitre.org <mailto:costello@mitre.org>>
>>>>>> Sent: Monday, October 28, 2019 10:41 AM
>>>>>> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org>
<users@daffodil.apache.org <mailto:users@daffodil.apache.org>>
>>>>>> Subject: Use cases for DFDL
>>>>>>  
>>>>>> Hi Folks,
>>>>>>  
>>>>>> I created a slide of use cases. See below. Do you agree with the
slide? Anything you would add, delete, or change?  /Roger
>>>>>>  
>>>>>> <image003.png>
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message