kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johan Lundahl <johan.lund...@gmail.com>
Subject Re: Producer questions and more
Date Fri, 14 Dec 2012 21:58:12 GMT
Thanks for some very helpful answers!

1) Great, our needs are somewhere in the thousands of topics and we could
probably scale out the number of servers as needed.

2) The reason I would like to separate out the producer is to have as small
and simple a library to integrate in our deployment as possible. Both
conflicts and size would be of interest to reduce since our apps are pretty
sensitive in practice. The best case would be to only have the
KafkaLog4jAppender as one well defined dependency but for now I'll just run
it through Proguard I think.

3) I don't know the motifs or state of Jafka (
https://github.com/adyliu/jafka) but if that project aims to be protocol
compatible, maybe that could be of help.

4) Ah thanks, I didn't even consider the console producer.

5) I'll do some tests with both 0.72 and 0.8 and see what happens. I would
need to start working on the integration for real around the end of Jan or
beginning of Feb. The most important thing is that the producer does as
little impact to our application as possible. On broker and consumer side,
it matters less since that will be in prototype mode in the beginning.

6) Very nice

7) What is the reasoning behind the Kafka name? "THE PROCESS", "Kafkaesque
complexity"? Even though I royally suck at design, possibly I'll try to do
some sketches of something during the holidays to at least fill the void in
some presentations.

Thanks again!


On Fri, Dec 14, 2012 at 9:20 PM, Jay Kreps <jay.kreps@gmail.com> wrote:

> 1. There are two kinds of limits: per server and overall. The per server
> limits come from the fact that we use one directory and at least one file
> per partition-replica. The normal rules of unix filesystem scalability
> apply. The per server limits can be mitigated by adding more servers. The
> overall limits mostly come from zookeeper, which we use for partition
> metadata. Zookeeper is non-partitioned and all in memory, so this probably
> puts the limit in the millions? These are the fundamental limits. More
> practically, we don't have regular performance tests for very large numbers
> of partitions, so it is buyer beware. So I think LinkedIn has something
> like a few thousand partitions in total. If you have more than that it
> should theoretically work up to the limits I described but you should try
> it first--if you uncover issues we are definitely interested in fixing
> them.
>
> 2. We haven't tried to separate out the client from the broker. It is
> possible, of course, but no one has done it. Can I ask specifically the
> problem you are interested in solving (fewer dependency conflicts? smaller
> binary?).
>
> 3. The log4j appender relys on the normal scala producer. It is possible to
> rewrite the producer in java, but it would be some work. This might be a
> good idea--I agree that the clients should ideally be thin and have
> few dependencies. The practical problem this introduces is that code
> sharing becomes a bit trickier. You are correct that the producer should no
> longer depend on zookeeper.
>
> 4. There is no mod_kafka that I know of. There is a console producer that
> will suck in file input and output kafka messages, which might work for
> you. mod_kafka would be a pretty sweet project idea.
>
> 5. Yes, this is true. We increased the scope of 0.8 quite a bit try to
> bundle non-compatible changes together. The answer depends on your level of
> risk tolerance. Right now at LinkedIn we are subjecting 0.8 to a forked
> version of our production load and we are still finding plenty of issues.
> We are hoping to get that stable in the next few weeks, and it will likely
> take several months to completely roll over all applications to 0.8 here.
> So right now it is probably safe for development only. When we have rolled
> it out 100% I would feel pretty confident saying it is very solid. In
> between now and then kind of depends on your risk tolerance. Perhaps one
> thing we could do is give a little more of an update as this testing
> progresses. It is obviously hard to give a rigerous schedule since it is
> mostly unknown unknowns.
>
> 6. As of a few days ago svn is used only for the website, and that is only
> because of a dependence on apache tooling.
>
> 7. There hasn't really been much a discussion on Logo, though we definitely
> need one. I offered to act as "personal programming slave" to any of the
> LinkedIn designers if they would make us a nice logo. If that approach
> fails maybe we should just do 99 designs?
>
> Cheers,
>
> -Jay
>
>
> On Fri, Dec 14, 2012 at 4:42 AM, Johan Lundahl <johan.lundahl@gmail.com
> >wrote:
>
> > Hi,
> >
> > I'm trying to promote Kafka for our centralized log aggregation/metrics
> > system and have set up a proof of concept based on 0.7.2 which seems to
> > work very well for our purposes but the improvements in 0.8 looks too
> > important for us to go live without them. After studying the presentation
> > material and videos I have some questions:
> >
> > 1) It's mentioned by Jay in one of the videos that Kafka is designed for
> <
> > 1000 topics. I understand the fundamentals of what a topic is meant to be
> > but are there any real system limits in regards to this? In our case, we
> > have around 100 clusters running our different (java only) applications
> > with a guesstimate average size of 40 nodes each. We have around 30
> > different types of logs plus some other metrics so this would give us 100
> > clusters * 35 types = 3500 topics. Furthermore, it's likely that the
> number
> > of clusters will increase in the future. Is this something that could
> cause
> > us trouble or are this figure of < 1000 topics just a guideline?
> >
> > 2) The KafkaLog4jAppender is a very convenient way for us to stream our
> > logs since no changes in the application code will be needed but is it
> > possible to build a lightweight jar with only the KafkaLog4jAppender
> > producer that we easily could deploy on our production servers? I'm not
> an
> > sbt expert but I could only manage to build a full package including
> broker
> > and everything which is a lot heavier.
> >
> > 3) As our applications are pure java, it would be nice to avoid the scala
> > runtime on the producer side. Would it be feasible to implement the
> > KafkaLog4jAppender in java? With 0.8, the dependency on Zookeeper should
> > not be needed on producer side either if I understand correctly right?
> >
> > 4) How do you handle non application logs, for example webserver logs? Is
> > there something like an Apache httpd mod_kafka? OS metrics?
> >
> > 5) In general I think it's somewhat tricky to follow the status of the
> > different Kafka versions. It seems like 0.8 has been postponed a bit
> > relative to original plans but are there newer estimations of when it can
> > be considered "stable"? Is there a summary of the important changes for
> the
> > version?
> >
> > 6) I've seen a few mails recently about git migration. Would it be enough
> > to only use git from 0.8 or would I still need svn for anything?
> >
> > 7) Has there been discussions about creating a logo for Kafka? My
> > conceptual system diagrams look a bit empty on the Kafka parts in the
> > promo-slides I've made...(the same thing applies to the Storm parts)
> >
> > Thanks a lot in advance for your help!
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message