metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Architectural reason to split in 4 topologies / impact on the kafka ressources
Date Sat, 23 Jun 2018 02:58:29 GMT
Hey Michel,

Those are good questions and there were some reasons surrounding that.  In
fact, historically, we had fewer topologies (e.g. indexing and enrichment
were merged). Even earlier on, we had just one giant topology per parser
that enriched and indexed.  The long story short is that we moved this way
because we saw how people were using metron and we gained more insight
tuning Metron.  That led us down this architectural path.

Some of the reasons that we went this way:

   - Fewer large topologies were a nightmare to tune
      - Enrichment would have different memory requirements than, say,
      parsers or indexing
      - You can adjust the kafka topic params per topology to adjust the
      number of partitions, etc.
   - Having the separate topologies gives a natural set of extension points
   for customization and enhancement (e.g. you want a phase between parsing
   and enrichment).
   - Decoupling the topologies lets us spin up and down parts of Metron
   without affecting others (e.g. you don't have to take down enrichments to
   add a parser, even for a moment)
   - The movement to Flux meant we were limited in how much we could adjust
   the topology at runtime (e.g. colocating parsers and enrichment would mean
   moving away from flux essentially as the topology changes its structure)

Best,

Casey


On Fri, Jun 22, 2018 at 5:25 PM Michel Sumbul <michelsumbul@gmail.com>
wrote:

> Hi Everyone,
>
> I was asking myself what was the architectural reason to split the
> ingestion in metron in 4 differents toppologies that all read/write to
> kafka?
>
> For example, why the parsing and enrichment topologies have not been
> merged? Would it not be possible when you parse the message to directly
> enricht it?
>
> Im asking that because splitting in several topologies means that all of
> the topologies read/write to Kafka, which produce a bigger load on the
> kafka cluster and then a need for way more infrastructure/servers. The cost
> is especially true when we speak about TBs of data ingested every day.
>
> Im sure there were a very good reason, I was just curious.
>
> Thanks,
> Michel
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message