kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clark Breyman <cl...@breyman.com>
Subject Re: How to design a robust producer?
Date Thu, 30 Jan 2014 15:20:41 GMT
Thibaud,

Sounds like one of your issues will be upstream of Kafka. Robust and UDP
aren't something I usually think of together unless you have additional
bookkeeping to detect and request lost messages. 8MB/s shouldn't be much of
a problem unless the messages are very small and looking for individual
commits. You also have the challenge of having the server
process/machine/network go away after the UDP message is received but
before it can be pushed to Kafka.

Beyond that, there are a lot of server frameworks that work fine. I use
Dropwizard mostly since I like Java, though it doesn't support UDP
resources. There are plenty of options there, but that's probably not a
Kafka issue.


On Thu, Jan 30, 2014 at 6:38 AM, Philip O'Toole <philip@loggly.com> wrote:

> Well, you could start by looking at the Kafka Producer source code for some
> ideas. We have built plenty of solid software on that.
>
> As to your goal of building something solid, robust, and critical. All I
> can say is you then need to keep your Producer as simple as possible -- the
> simpler it is, the less like it is to crash, have bugs, and you must test
> it very well. Get the data to Kafka as fast as possible, so the chance of
> losing any due to a crash are very small. Take a long time to test it. The
> Producers I have written (in C++) run for weeks without going down (and
> then we usually bring them down on purpose for upgrades). However, they
> were in test for months too.
>
> http://www.youtube.com/watch?v=LpNbjXFPyZ0
>
>
> On Thu, Jan 30, 2014 at 6:31 AM, Thibaud Chardonnens
> <thibaud.ch@gmail.com>wrote:
>
> > Thanks for your quick answer.
> > Yes, sorry it's probably too broad but my main question was if there is
> > any best practices to build a robust, fault-tolerant producer that
> > guarantees that no data will be dropped while listening on the port.
> > From my point of view the producer will be the most critical part in the
> > system, if something goes wrong with it, the workflow will be stopped and
> > data will be lost.
> >
> > Do you have by any chance a pointer to an existing implementation of a
> > such producer?
> >
> > Thanks
> >
> >
> > Le 30 janv. 2014 à 15:13, Philip O'Toole <philip@loggly.com> a écrit :
> >
> > > What exactly are you struggling with? Your question is too broad. What
> > you want to do is eminently possible, having done it myself from scratch.
> > >
> > > Philip
> > >
> > >> On Jan 30, 2014, at 6:00 AM, Thibaud Chardonnens <
> thibaud.ch@gmail.com>
> > wrote:
> > >>
> > >> Hello -- I am struggling about how to design a robust implementation
> of
> > a producer.
> > >>
> > >> My use case is quite simple:
> > >> I want to process a relatively big stream (~8MB/s) with Storm. Kafka
> > will be used as intermediate between the stream and Storm. The stream is
> > sent to a specific server on a specific port (through UDP). So Storm will
> > be the consumer and I need to write a producer (basically in Java) that
> > will listen on that specific port and send messages to a Kafka topic.
> > >>
> > >> Kafka and Storm are well designed and fault-tolerant, if a node goes
> > down the whole environment continues to work properly etc... Therefore my
> > producer will be a single point of failure in the workflow. Moreover,
> > writing a such producer is not so easy, I'll need to write a
> multithreaded
> > server to keep up with the throughput of the stream without guarantee
> that
> > no data will be dropped...
> > >>
> > >> So I would like to know if there is some best practices to write a
> such
> > producer or is there an other (maybe simpler) way to do?
> > >>
> > >> Thanks,
> > >> Thibaud
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message