metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Miklavcic <michael.miklav...@gmail.com>
Subject Re: [DISCUSS] Splitting up the Indexing Topology
Date Thu, 28 Sep 2017 02:07:08 GMT
+1 to the split. I also feel it's much easier to dissect problems when
these actions are separated. It's also easier to fine tune each
independently, which may have additional performance benefits.

M

On Mon, Sep 25, 2017 at 5:31 PM, James Sirota <jsirota@apache.org> wrote:

> I have experienced issues with ES and HDFS indexing in production and have
> previously split out the topologies into two separate topologies.  As you
> state the benefits of this approach are (a) tuning each topology
> separately, (b) ability to attribute problems to a specific topology (why
> is something slow?) and (c) graceful degradation.  When ES, for example,
> fails partially or catastrophically and your ES topology goes all kinds of
> crazy, HDFS topology keeps humming along unaffected.  Once Metron-1205 is
> in you will be able to re-index into ES (or potentially other sources) from
> HDFS at will.  The major con for this architecture is that there is a
> greater chance that all your data sources will get out of sync because they
> index/store data at different rates.  But even given that I would vote +1
> on splitting out the topologies.
>
> 25.09.2017, 09:37, "Casey Stella" <cestella@gmail.com>:
> > One of the lessons that have bubbled up in doing some performance
> analysis
> > is that having the indexing topology share both the ES and the HDFS
> writer
> > in the same topology can be problematic from a tuning perspective.
> > Specifically, it's hard to square that circle and make both perform fast
> > enough to not cause significant back-pressure in kafka (and often Commit
> > Exceptions in the kafka spout).
> >
> > I wanted to get the community's opinion about the possibility of
> separating
> > the two current writers into separate topologies which could be tuned
> > separately.
> >
> > Pros:
> >
> >    - Practically speaking, tuning separately is often a lot easier than
> >    trying to tune together
> >    - This opens us up with the beginnings of an abstraction that may be
> >    reusable to expose new indexers to Metron
> >
> > Cons:
> >
> >    - It has the potential to mask a problem. We may want to ensure that
> >    the writers write at the same rate and don't get far ahead of one
> another.
> >    In the current setup, this is inherent in the design. If we separate
> them,
> >    they may be reading at different rates and one index may get ahead of
> the
> >    other.
> >    - The management pack section around indexing would need to be
> >    reconsidered if we split them up
> >
> > Personally, I'm strongly in favor of splitting them up, but I want to
> make
> > sure that we don't miss an important nuance here. The first con is
> > concerning to me, but I'd argue that another lesson from performance
> tuning
> > is that we need to monitor the average partition lag over time in the
> > management UI for the various consumer groups and ensure that writing
> keeps
> > up with reading. If we insist on this assertion being true for all
> healthy
> > metron installations, the primary con goes away in my mind.
> >
> > Anyway, I'm sure I've missed some pros and cons, so it'd be great to hear
> > community feedback here. Thoughts?
>
> -------------------
> Thank you,
>
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message