drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Drill and Elasticsearch
Date Thu, 23 Feb 2017 14:19:51 GMT
On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <john@omernik.com> wrote:

> I guess, I am just looking for ideas, how would YOU get data from Parquet
> files into Elastic Search? I have Drill and Spark at the ready, but want to
> be able to handle it as efficiently as possible.  Ideally, if we had a well
> written ES plugin, I could write a query that inserted into an index and
> streamed stuff in... but barring that, what other methods have people used?

My traditional method has been to use Python's version of the ES batch load
API. This runs ES pretty hard, but you would need more to saturate a really
large ES cluster. Often I export a JSON file using whatever tool (Drill
would work) and then use the python on that file. Avoids questions of
Python reading obscure stuff. I think that Python is now able to read and
write Parquet, but that is pretty new stuff, so I would stay old school

I don't think that you need a lot of sophistication on the loader.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message