drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Drill and Elasticsearch
Date Thu, 23 Feb 2017 14:19:51 GMT
On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <john@omernik.com> wrote:

> I guess, I am just looking for ideas, how would YOU get data from Parquet
> files into Elastic Search? I have Drill and Spark at the ready, but want to
> be able to handle it as efficiently as possible.  Ideally, if we had a well
> written ES plugin, I could write a query that inserted into an index and
> streamed stuff in... but barring that, what other methods have people used?
>

My traditional method has been to use Python's version of the ES batch load
API. This runs ES pretty hard, but you would need more to saturate a really
large ES cluster. Often I export a JSON file using whatever tool (Drill
would work) and then use the python on that file. Avoids questions of
Python reading obscure stuff. I think that Python is now able to read and
write Parquet, but that is pretty new stuff, so I would stay old school
there.

I don't think that you need a lot of sophistication on the loader.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message