metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Sirota <>
Subject Re: [DISCUSS] Splitting up the Indexing Topology
Date Mon, 25 Sep 2017 23:31:24 GMT
I have experienced issues with ES and HDFS indexing in production and have previously split
out the topologies into two separate topologies.  As you state the benefits of this approach
are (a) tuning each topology separately, (b) ability to attribute problems to a specific topology
(why is something slow?) and (c) graceful degradation.  When ES, for example, fails partially
or catastrophically and your ES topology goes all kinds of crazy, HDFS topology keeps humming
along unaffected.  Once Metron-1205 is in you will be able to re-index into ES (or potentially
other sources) from HDFS at will.  The major con for this architecture is that there is a
greater chance that all your data sources will get out of sync because they index/store data
at different rates.  But even given that I would vote +1 on splitting out the topologies.

25.09.2017, 09:37, "Casey Stella" <>:
> One of the lessons that have bubbled up in doing some performance analysis
> is that having the indexing topology share both the ES and the HDFS writer
> in the same topology can be problematic from a tuning perspective.
> Specifically, it's hard to square that circle and make both perform fast
> enough to not cause significant back-pressure in kafka (and often Commit
> Exceptions in the kafka spout).
> I wanted to get the community's opinion about the possibility of separating
> the two current writers into separate topologies which could be tuned
> separately.
> Pros:
>    - Practically speaking, tuning separately is often a lot easier than
>    trying to tune together
>    - This opens us up with the beginnings of an abstraction that may be
>    reusable to expose new indexers to Metron
> Cons:
>    - It has the potential to mask a problem. We may want to ensure that
>    the writers write at the same rate and don't get far ahead of one another.
>    In the current setup, this is inherent in the design. If we separate them,
>    they may be reading at different rates and one index may get ahead of the
>    other.
>    - The management pack section around indexing would need to be
>    reconsidered if we split them up
> Personally, I'm strongly in favor of splitting them up, but I want to make
> sure that we don't miss an important nuance here. The first con is
> concerning to me, but I'd argue that another lesson from performance tuning
> is that we need to monitor the average partition lag over time in the
> management UI for the various consumer groups and ensure that writing keeps
> up with reading. If we insist on this assertion being true for all healthy
> metron installations, the primary con goes away in my mind.
> Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear
> community feedback here. Thoughts?

Thank you,

James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

View raw message