kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Producer questions and more
Date Fri, 14 Dec 2012 20:20:10 GMT
1. There are two kinds of limits: per server and overall. The per server
limits come from the fact that we use one directory and at least one file
per partition-replica. The normal rules of unix filesystem scalability
apply. The per server limits can be mitigated by adding more servers. The
overall limits mostly come from zookeeper, which we use for partition
metadata. Zookeeper is non-partitioned and all in memory, so this probably
puts the limit in the millions? These are the fundamental limits. More
practically, we don't have regular performance tests for very large numbers
of partitions, so it is buyer beware. So I think LinkedIn has something
like a few thousand partitions in total. If you have more than that it
should theoretically work up to the limits I described but you should try
it first--if you uncover issues we are definitely interested in fixing them.

2. We haven't tried to separate out the client from the broker. It is
possible, of course, but no one has done it. Can I ask specifically the
problem you are interested in solving (fewer dependency conflicts? smaller
binary?).

3. The log4j appender relys on the normal scala producer. It is possible to
rewrite the producer in java, but it would be some work. This might be a
good idea--I agree that the clients should ideally be thin and have
few dependencies. The practical problem this introduces is that code
sharing becomes a bit trickier. You are correct that the producer should no
longer depend on zookeeper.

4. There is no mod_kafka that I know of. There is a console producer that
will suck in file input and output kafka messages, which might work for
you. mod_kafka would be a pretty sweet project idea.

5. Yes, this is true. We increased the scope of 0.8 quite a bit try to
bundle non-compatible changes together. The answer depends on your level of
risk tolerance. Right now at LinkedIn we are subjecting 0.8 to a forked
version of our production load and we are still finding plenty of issues.
We are hoping to get that stable in the next few weeks, and it will likely
take several months to completely roll over all applications to 0.8 here.
So right now it is probably safe for development only. When we have rolled
it out 100% I would feel pretty confident saying it is very solid. In
between now and then kind of depends on your risk tolerance. Perhaps one
thing we could do is give a little more of an update as this testing
progresses. It is obviously hard to give a rigerous schedule since it is
mostly unknown unknowns.

6. As of a few days ago svn is used only for the website, and that is only
because of a dependence on apache tooling.

7. There hasn't really been much a discussion on Logo, though we definitely
need one. I offered to act as "personal programming slave" to any of the
LinkedIn designers if they would make us a nice logo. If that approach
fails maybe we should just do 99 designs?

Cheers,

-Jay


On Fri, Dec 14, 2012 at 4:42 AM, Johan Lundahl <johan.lundahl@gmail.com>wrote:

> Hi,
>
> I'm trying to promote Kafka for our centralized log aggregation/metrics
> system and have set up a proof of concept based on 0.7.2 which seems to
> work very well for our purposes but the improvements in 0.8 looks too
> important for us to go live without them. After studying the presentation
> material and videos I have some questions:
>
> 1) It's mentioned by Jay in one of the videos that Kafka is designed for <
> 1000 topics. I understand the fundamentals of what a topic is meant to be
> but are there any real system limits in regards to this? In our case, we
> have around 100 clusters running our different (java only) applications
> with a guesstimate average size of 40 nodes each. We have around 30
> different types of logs plus some other metrics so this would give us 100
> clusters * 35 types = 3500 topics. Furthermore, it's likely that the number
> of clusters will increase in the future. Is this something that could cause
> us trouble or are this figure of < 1000 topics just a guideline?
>
> 2) The KafkaLog4jAppender is a very convenient way for us to stream our
> logs since no changes in the application code will be needed but is it
> possible to build a lightweight jar with only the KafkaLog4jAppender
> producer that we easily could deploy on our production servers? I'm not an
> sbt expert but I could only manage to build a full package including broker
> and everything which is a lot heavier.
>
> 3) As our applications are pure java, it would be nice to avoid the scala
> runtime on the producer side. Would it be feasible to implement the
> KafkaLog4jAppender in java? With 0.8, the dependency on Zookeeper should
> not be needed on producer side either if I understand correctly right?
>
> 4) How do you handle non application logs, for example webserver logs? Is
> there something like an Apache httpd mod_kafka? OS metrics?
>
> 5) In general I think it's somewhat tricky to follow the status of the
> different Kafka versions. It seems like 0.8 has been postponed a bit
> relative to original plans but are there newer estimations of when it can
> be considered "stable"? Is there a summary of the important changes for the
> version?
>
> 6) I've seen a few mails recently about git migration. Would it be enough
> to only use git from 0.8 or would I still need svn for anything?
>
> 7) Has there been discussions about creating a logo for Kafka? My
> conceptual system diagrams look a bit empty on the Kafka parts in the
> promo-slides I've made...(the same thing applies to the Storm parts)
>
> Thanks a lot in advance for your help!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message