drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Apache Drill
Date Mon, 19 Oct 2015 00:56:40 GMT
Kasper,

How is the mapping you suggest specified?

In my example, I meant for there to be many records in a file and each
record element to be a record insofar as Drill is concerned.  I also didn't
include other information that presumably would make it more interesting to
talk about a record element as a unit.

Your suggestion (1) is essentially to denest the records, but that loses
the nice hierarchical structure expressed in the original that so easily
could be expressed in the JSON data model.

For your option (2), what do you mean by map 2 tables?  Does MetaModel
inherently assume that all output is purely relational?




On Sun, Oct 18, 2015 at 1:18 PM, Kasper Sørensen <
i.am.kasper.sorensen@gmail.com> wrote:

> Hi Ted,
>
> Actually in MetaModel you then have two choices with your mapping to table
> format.
>
> 1) Either map the "item" as the granularity of a record. That way you will
> get three rows - one for each item. On the last of the two rows you would
> have the same values for any element that is registered at the <record>
> scope.
>
> 2) You can also map 2 tables instead - one for <record> and one for <item>
> and then join them as you like.
>
>
> 2015-10-18 20:24 GMT+02:00 Ted Dunning <ted.dunning@gmail.com>:
>
> > Kasper,
> >
> > This might work.
> >
> > One issue that I see is that Metamodel seems to take a very XML centric
> > view of things while Drill takes a pretty JSON view of things.
> >
> > The point at which I think that this might cause problems is that Drill
> > currently has troubles when it sees a records like
> >
> > <record><item>1</item></record>
> > <record><item>2</item><item>3</item></record>
> >
> > This is fine as far as XML is concerned, but if you think about it in
> terms
> > of JSON, it is probably best to view these records as
> >
> > {"item":[1]}
> > {"item":[2,3]}
> >
> > Unfortunately, from the first record, there is no way to tell that it
> > should not be viewed as
> >
> > {"item":1}
> >
> > Do you have a suggestion that would help with this?
> >
> >
> > On Sun, Oct 18, 2015 at 8:41 AM, Kasper Sørensen <
> > i.am.kasper.sorensen@gmail.com> wrote:
> >
> > > Hi there,
> > >
> > > Sorry for barging in, but maybe this is a place where Drill and
> MetaModel
> > > could benefit from each other? We've considered that before at least
> ...
> > >
> > > MetaModel already has support for both DOM and SAX based XML querying.
> > They
> > > basically inherit some characteristics from DOM and SAX respectively:
> > >
> > >  - In the DOM variant we can infer a schema and all the user has to do
> is
> > > select a XML file/resource anywhere.
> > >  - In the SAX variant the user has to specify which paths in the XML
> > > document should represent logical "tables" and what paths represent
> their
> > > columns.
> > >
> > > See [1] for more info. Hope this might be of interest to integrate into
> > > Drill?
> > >
> > > Best regards,
> > > Kasper Sørensen (from the MetaModel project)
> > >
> > > [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping
> > >
> > > 2015-10-18 0:35 GMT+02:00 Magnus Pierre <mpierre@maprtech.com>:
> > >
> > > > Well, very few lines of code imho. And simple. Been able to parse
> > pretty
> > > > deep structures with no issues so far. Performance? 10-15 5mb xml's
> in
> > > less
> > > > than a second on my laptop but then I run it using Storm with some
> > > > parallelism in place. Don't know if it's good or bad. I'll share the
> > code
> > > > next time I use computer. You don't need to use it, but it works at
> > > least.
> > > >
> > > > /M
> > > > Den 17 okt 2015 10:43 em skrev "Matt Burgess" <mattyb149@gmail.com>:
> > > >
> > > > > If the converter is clean and performant then I'm sure the
> community
> > > > > (including me) is interested :)
> > > > >
> > > > > However I wonder if Drill can afford to add a translation layer
> > between
> > > > > data formats, could we be better served with similar parsing in
> Drill
> > > for
> > > > > XML as we do for JSON, or can it be pushed down far enough (to the
> > > > parser)
> > > > > to not make a noticeable difference (which is what I think Julian
> is
> > > > > implying)?
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On Oct 17, 2015, at 1:41 PM, Magnus Pierre <mpierre@maprtech.com
> >
> > > > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Just wrote a simple sax implementation that converts xml to
json
> > and
> > > > that
> > > > > > is able to deal with decently complex xml's, that I currently
use
> > in
> > > > > Storm.
> > > > > > Takes attributes, and everything.
> > > > > >
> > > > > > I can share it with the community if interesting.
> > > > > >
> > > > > > /Magnus
> > > > > > Den 17 okt 2015 7:02 em skrev "Julian Hyde" <
> julian@hydromatic.net
> > >:
> > > > > >
> > > > > >> Seems to me the biggest problem is to make drill understand
the
> > > nested
> > > > > >> structure of an xml document. That work has been done for
json,
> so
> > > > let's
> > > > > >> build on it. Suppose there was a translator that converted
xml
> to
> > > json
> > > > > >> (adding attributes for things that json lacks, such as
> namespaces,
> > > > text,
> > > > > >> element tags). Drill knows how to handle json, even if it
is a
> bit
> > > > > verbose.
> > > > > >> The translator could be applied on the fly.
> > > > > >>
> > > > > >> Julian
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Sent from my iPad
> > > > > >>>> On Oct 16, 2015, at 2:31 PM, Stefán Baxter <
> > > > stefan@activitystream.com
> > > > > >
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> It's not possible but there has been some talk here
about
> > > supporting
> > > > > it.
> > > > > >>> If I remember correctly it's rather complicated and
not really
> > > > > feasible.
> > > > > >>> (I'm just a newbie so don't take my words for it)
> > > > > >>>
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> -Stefan
> > > > > >>>
> > > > > >>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo <
> > > > > Daniel.Ajo@abarcahealth.com
> > > > > >>>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Hey there,
> > > > > >>>>
> > > > > >>>> I was wondering if it is possible to query XML files
using
> > Apache
> > > > > Drill?
> > > > > >>>>
> > > > > >>>> I see there are several formats, and maybe it would
work using
> > an
> > > > > xpath
> > > > > >>>> query of some sorts, but just wondering if it would
work to
> > > directly
> > > > > >> query
> > > > > >>>> it using some sort of plug-in.
> > > > > >>>>
> > > > > >>>> Well, let me know,
> > > > > >>>>
> > > > > >>>> Daniel Ajo
> > > > > >>>> *********************************************************
> > > > > >> CONFIDENTIALITY
> > > > > >>>> NOTE: This electronic transmission contains information
> > belonging
> > > to
> > > > > >> Abarca
> > > > > >>>> Health LLC, which is confidential or legally privileged.
If
> you
> > > are
> > > > > not
> > > > > >> the
> > > > > >>>> intended recipient, please immediately advise the
sender by
> > reply
> > > > > >> e-mail or
> > > > > >>>> telephone that this message has been inadvertently
transmitted
> > to
> > > > you
> > > > > >> and
> > > > > >>>> delete this e-mail from your system. If you have
received this
> > > > > >> transmission
> > > > > >>>> in error, you are hereby notified that any disclosure,
> copying,
> > > > > >>>> distribution or the taking of any action in reliance
on the
> > > contents
> > > > > of
> > > > > >> the
> > > > > >>>> information is strictly prohibited.
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message