kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <guy.doulb...@conduit.com>
Subject Architecture Consulting
Date Tue, 19 Jun 2012 13:12:57 GMT
Hi all,

We'd like to consult with you about our Kafka architecture,

We have Http endpoints that receive events from the web, and push them into the system via
kafka. The events are distinguishable by their HTTP url, and are sharded to their corresponding

We have 2 designs in mind:

1. One main 'raw' topic, split to multiple enriched topics.
The endpoints write to one kafka topic, lets call it 'Raw topic'.
The above 'raw topic' is consumed by some kafka consumer which does the following:
i - enrich the data (extract ip-to-location info, standardize browser/os type, etc)
ii -feed the enriched data to a new topic, based on the referrer information.

2. Multiple 'raw' topics each fed to its corresponding 'enriched' topic.
Have the web endpoints shard the events based on their referrer, creating multiple 'raw' topics,
one per referrer type/domain.
Each 'raw' topic is then consumed, and a corresponding enriched stream/topic is created from

The dilemma is weather to  do the separation to topics as soon as we can, at the web endpoint
(option 2)
or to postpone it as much as possible (option 1).....

I prefer option 1 , but tests I ran, reveaI that in a scenario where there are many event
types in the same topic, and some event types have many more occurrences than others, the
more frequent event types seem to "drown" the less common ones, which roughly translates to
the fact that less common events may appear at their consumer side much later in time than
the more frequent ones.
If my system requires a 'timely' processing of events, this behaviour poses a problem.

What do you think? thanks

View raw message