drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Pierre <mpie...@maprtech.com>
Subject Re: Query XML-files with Drill
Date Fri, 26 Aug 2016 21:07:30 GMT
Hello Per,

I (not in any way related to MapR engineering) used to develop an idea of a plugin for XML
that is using on-the-fly behind the curtains conversion to JSON to utilize all the goodies
Drill has for JSON but due to a stumbling block (XML needs the support of a very dynamic and
flexible JSON schema due to the nature of XML that Drill only can support if UNION type is
activated) and that is for good reasons filtered out for embeddedContent which my plugin provides
to the JsonRecordReader:

From: JdbcRecordReader.java:
this.unionEnabled = embeddedContent == null && fragmentContext.getOptions().getOption(ExecConstants.ENABLE_UNION_TYPE);

I decided to pause the efforts. FYI: The Drill union type had issues with embeddedContent
that were somewhat random and very hard to debug which I never saw when handing the generated
JSON as a file. I am sure that someone more competent in the Drill code would be able to figure
out what is happening. :)

Your best alternative as-is today is probably to do processing of XML in Spark the XML plugin
for Spark is quite competent, or do a simple XML to json conversion in Spark and materialize
as JSON documents that you then query with Drill. 

Regards,
Magnus



> 26 aug 2016 kl. 08:36 skrev Per Weinberger <Per.Weinberger@nasdaq.com>:
> 
> Hi,
> 
> I'm looking for examples or information on how to query xml-files with Drill. I working
with somewhat large (100+ mb) xml-files and would like to query them in-situ. Are there any
examples or information regarding this? I would think that this would be a fairly common thing
to do, but there is very little regarding this on Google or Stack overflow.
> 
> Cheers,
> Per Weinberger
> ***********************************************************
> CONFIDENTIALITY NOTICE: This e-mail and any attachments are for the exclusive and confidential
use of the intended recipient and may constitute non-public information. If you received this
e-mail in error, disclosing, copying, distributing or taking any action in reliance of this
e-mail is strictly prohibited and may be unlawful. Instead, please notify us immediately by
return e-mail and promptly delete this message and its attachments from your computer system.
We do not waive any work product or other applicable legal privilege(s) by the transmission
of this message.
> ***********************************************************


Mime
View raw message