kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S Ahmed <sahmed1...@gmail.com>
Subject Re: tracking page views at linkedin
Date Mon, 10 Dec 2012 18:51:09 GMT
Ok just looking at the code, seems like you could even create a new
implementation and somehow rollout the page views potentially (if that is
possible in the use case) before sending them over the wire.

e.g. maybe you can just increment the couter to 2 instead of sending 2 line

The key is to also figure out what size or time to queue before pushing
them to kafka.  For something like a page view, and other request
information like browser, timestamp, querystring values, you could probably
store a few hundred?

On Sun, Dec 9, 2012 at 6:21 PM, Jay Kreps <jay.kreps@gmail.com> wrote:

> Yes this is how it works. We do not log out to disk on the web service
> machines, rather we use the async setting in the kafka producer from the
> app and it directly sends all tracking and monitoring data to the kafka
> cluster.
> On Sun, Dec 9, 2012 at 12:47 PM, S Ahmed <sahmed1020@gmail.com> wrote:
> > I was reading (or watching) how linkedin uses kafka to track page views.
> >
> > I'm trying to imagine this in practise, where linkedin probably has 100's
> > of web servers serving requests, and each server is making a put call to
> > kafka to track a single page view.
> >
> > is this really the case?  Or does some other service roll up the web
> > servers log files and then push it to kafka on a batch basis?
> >
> > Interesting stuff!
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message