kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Lossen <...@lossen.de>
Subject Re: ‏‏RE: Architecture Consulting
Date Tue, 19 Jun 2012 22:22:41 GMT
no, we do not preprocess and republish the events, although we
have toyed with the idea. currently, all our consumers do their own
preprocessing (ip lookup etc.).


On 2012-06-19, at 10:09 PM, Guy Doulberg wrote:

> Hןi Tom,
> Thanks for you replay,
> Do in your implementation you have enrichment process?
> If so, how do you perform the enrichment on each of the topics?
> Thanks, Guy
> ________________________________________
> ‏‏מאת: Tim Lossen [tim@lossen.de]
> ‏‏נשלח: ‏‏יום שלישי 19 יוני 2012 18:53
> ‏‏אל: kafka-users@incubator.apache.org
> ‏‏נושא: Re: Architecture Consulting
> well, we decided to go with one topic per game (approach 2),
> as there are some consumers which are only interested in data
> from a single topic. makes it a bit harder for consumers interested
> in processing ALL events though.
> not knowing more about your concrete situation, it is difficult
> to decide what is better in your case.
> cheers
> tim
> On 2012-06-19, at 15:12 , Guy Doulberg wrote:
>> Hi all,
>> We'd like to consult with you about our Kafka architecture,
>> We have Http endpoints that receive events from the web, and push  
>> them into the system via kafka. The events are distinguishable by  
>> their HTTP url, and are sharded to their corresponding topics.
>> We have 2 designs in mind:
>> 1. One main 'raw' topic, split to multiple enriched topics.
>> The endpoints write to one kafka topic, lets call it 'Raw topic'.
>> The above 'raw topic' is consumed by some kafka consumer which does  
>> the following:
>> i - enrich the data (extract ip-to-location info, standardize  
>> browser/os type, etc)
>> ii -feed the enriched data to a new topic, based on the referrer  
>> information.
>> 2. Multiple 'raw' topics each fed to its corresponding 'enriched'  
>> topic.
>> Have the web endpoints shard the events based on their referrer,  
>> creating multiple 'raw' topics, one per referrer type/domain.
>> Each 'raw' topic is then consumed, and a corresponding enriched  
>> stream/topic is created from it.
>> The dilemma is weather to  do the separation to topics as soon as  
>> we can, at the web endpoint (option 2)
>> or to postpone it as much as possible (option 1).....
>> I prefer option 1 , but tests I ran, reveaI that in a scenario  
>> where there are many event types in the same topic, and some event  
>> types have many more occurrences than others, the more frequent  
>> event types seem to "drown" the less common ones, which roughly  
>> translates to the fact that less common events may appear at their  
>> consumer side much later in time than the more frequent ones.
>> If my system requires a 'timely' processing of events, this  
>> behaviour poses a problem.
>> What do you think? thanks
> --
> http://tim.lossen.de


View raw message