metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Miklavcic <michael.miklav...@gmail.com>
Subject Re: [DISCUSS] Upgrading Elasticsearch from 2.x to 5.x
Date Thu, 05 Oct 2017 17:52:03 GMT
I think it might help the discussion to share my impressions of looking
over the new API recommendations from ES. I've summarized some info
provided by ES back in December 2016 regarding the reasons for switching to
a new client model. [1]

*Summary points:*

Pre-5.x had Java API - binary exchange format used for node-to-node
communications.
In 5.x a low level REST API was added. Now there's also a high level REST
client that handles request marshalling and response un-marshalling.

*Benefits of existing Java API*

   1. Theoretically faster - binary format, no JSON parsing
   2. Hardened, used for internal ES node to node communications

*Cons of Java API*

   1. Benchmarks show it's not really that much faster.
   2. Backwards compatibility - Java API changes often.
   3. Upgrades more challenging - need to refactor client code for new and
   deprecated features.
   4. Minor releases may contain breaking changes in the Java API
   5. Client and server *should* be on same JVM version (not as important
   post 2.x, but still potentially necessary bc of serialization w/binary
   format)
   6. Requires dependency on the entire elasticsearch server in order to
   use the client. We end up shading jars.

*Benefits of new REST API*

   1. Upgrades
      1. Breaking changes only made in major releases - "We are very
      careful with backwards compatibility on the REST layer where breaking
      changes are made only in major releases."
      2. "The REST interface is much more stable and can be upgraded out of
      step with the Elasticsearch cluster."
   2. REST client and server can be on different JVM's
   3. Dependencies for the low level client are very slim. No need for
   shading.
   4. The RestHighLevelClient supports the same request and response
   objects as the TransportClient
   5. Can be secured via HTTPS

There are some additional benefits to the new API, however they depend on
whether we choose to go with the high or low level client. More comments
below.

*Cons of new API*

   1. Dependencies - The high level client still requires the full ES
   dependency, though this will slim down in future releases.

*Other comments specific to Metron*

There's a question of whether we should use the low or high level REST
client. The main differences between the two are how they handle lib
dependencies and marshaling/unmarshaling. The low level client cleans up
the dependencies dramatically, whereas the high level client still requires
you to depend on elasticsearch core. On the other hand, the low level
client does no work to handle marshaling/unmarshaling the
requests/responses from the HTTP calls while the high level client handles
this for you and exposes api-specific methods. The high level client
accepts the same request arguments as the TransportClient and returns the
same response objects. One more thing to note is that the low level client
claims to be compatible with all versions of ES whereas the high level
client appears to be only major version compatible.

"The 5.6 client can communicate with any 5.6.x Elasticsearch node. Previous
5.x minor versions like 5.5.x, 5.4.x etc. are not (fully) supported." [2]

Just as an example, here's a simple comparison of an index request in the
low and high level API's.

*Low Level*

Map<String, String> params = Collections.emptyMap();
String jsonString = "{" +
            "\"user\":\"kimchy\"," +
            "\"postDate\":\"2013-01-30\"," +
            "\"message\":\"trying out Elasticsearch\"" +
        "}";
HttpEntity entity = new NStringEntity(jsonString,
ContentType.APPLICATION_JSON);
Response response = restClient.performRequest("PUT", "/posts/doc/1",
params, entity);

*High Level*

IndexRequest indexRequest = new IndexRequest("posts", "doc", "1")
        .source("user", "kimchy",
                     "postDate", new Date(),
                     "message", "trying out Elasticsearch");

*Note*: there are a few ways to do this with the high level API, but this
was the most concise for me to offer a comparison of benefits over the low
level API.

*Thoughts/Recommendations*: I do think we should migrate to the new API. I
think the question is which of the new APIs we should use. The high level
client seems to shield us from having to deal with constructing special
JSON handling code, whereas the low level client handles all versions of
ES. I don't have a good feel (yet) for just how much work it would require
to use the low level API, or how difficult it would be to add new request
features in the future. Actually, we could probably leverage existing code
we have for dealing with JSON maps, so this might be really easy. Someone
with more experience in Metron's ES client use might have a better idea of
the pros and cons to this. The high level client appears to handle
everything all JSON manipulation for us, but we lose the benefit of a
simpler dependency tree and support for all versions of ES. My only concern
with "supports all versions" is that I have to imagine there are specific
calls that we'd have to be careful of when constructing the JSON requests,
so it's unclear to me if this is better or worse in the end.

Best,
Mike


   1. https://www.elastic.co/blog/state-of-the-official-
   elasticsearch-java-clients
   2. https://www.elastic.co/guide/en/elasticsearch/client/java-
   rest/current/java-rest-high-compatibility.html
   <https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-compatibility.html>




On Wed, Sep 27, 2017 at 8:03 PM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> I am working on upgrading Elasticsearch and Kibana. There are quite a few
> changes involved with this vix. I believe I'm mostly finished with the
> Ambari mpack side of things, however we currently only support one version
> with no backwards compatibility. What is the community's thoughts on this?
>
> Here is some work contributed to the community that I'm referencing while
> working on this upgrade - https://github.com/apache/metron/pull/619/files
>
> Best,
> Michael Miklavcic
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message