drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn" <uw...@xhochy.com>
Subject Re: Drill and Elasticsearch
Date Thu, 23 Feb 2017 14:29:01 GMT
On Thu, Feb 23, 2017, at 03:19 PM, Ted Dunning wrote:
> On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <john@omernik.com> wrote:
> 
> > I guess, I am just looking for ideas, how would YOU get data from Parquet
> > files into Elastic Search? I have Drill and Spark at the ready, but want to
> > be able to handle it as efficiently as possible.  Ideally, if we had a well
> > written ES plugin, I could write a query that inserted into an index and
> > streamed stuff in... but barring that, what other methods have people used?
> >
> 
> My traditional method has been to use Python's version of the ES batch
> load
> API. This runs ES pretty hard, but you would need more to saturate a
> really
> large ES cluster. Often I export a JSON file using whatever tool (Drill
> would work) and then use the python on that file. Avoids questions of
> Python reading obscure stuff. I think that Python is now able to read and
> write Parquet, but that is pretty new stuff, so I would stay old school
> there.

If you want to try it, see
https://pyarrow.readthedocs.io/en/latest/parquet.html

You can use `conda install pyarrow` to get it, probably next monday it
will also be pip-installable. It's based on Apache Arrow and Apache
Parquet C++, we're happy about any feedback!

Mime
View raw message