On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <john@omernik.com> wrote:
> I guess, I am just looking for ideas, how would YOU get data from Parquet
> files into Elastic Search? I have Drill and Spark at the ready, but want to
> be able to handle it as efficiently as possible. Ideally, if we had a well
> written ES plugin, I could write a query that inserted into an index and
> streamed stuff in... but barring that, what other methods have people used?
>
My traditional method has been to use Python's version of the ES batch load
API. This runs ES pretty hard, but you would need more to saturate a really
large ES cluster. Often I export a JSON file using whatever tool (Drill
would work) and then use the python on that file. Avoids questions of
Python reading obscure stuff. I think that Python is now able to read and
write Parquet, but that is pretty new stuff, so I would stay old school
there.
I don't think that you need a lot of sophistication on the loader.
|