kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Arthur <mum...@gmail.com>
Subject Re: Producer questions and more
Date Sat, 15 Dec 2012 03:04:29 GMT
7) Perhaps a dung beetle should be the logo, as featured in The 
Metamorphosis. Or maybe just a nice stylized version of the word "Kafka" 
(like Solr and Lucene).

On 12/14/12 4:58 PM, Johan Lundahl wrote:
> Thanks for some very helpful answers!
> 1) Great, our needs are somewhere in the thousands of topics and we could
> probably scale out the number of servers as needed.
> 2) The reason I would like to separate out the producer is to have as small
> and simple a library to integrate in our deployment as possible. Both
> conflicts and size would be of interest to reduce since our apps are pretty
> sensitive in practice. The best case would be to only have the
> KafkaLog4jAppender as one well defined dependency but for now I'll just run
> it through Proguard I think.
> 3) I don't know the motifs or state of Jafka (
> https://github.com/adyliu/jafka) but if that project aims to be protocol
> compatible, maybe that could be of help.
> 4) Ah thanks, I didn't even consider the console producer.
> 5) I'll do some tests with both 0.72 and 0.8 and see what happens. I would
> need to start working on the integration for real around the end of Jan or
> beginning of Feb. The most important thing is that the producer does as
> little impact to our application as possible. On broker and consumer side,
> it matters less since that will be in prototype mode in the beginning.
> 6) Very nice
> 7) What is the reasoning behind the Kafka name? "THE PROCESS", "Kafkaesque
> complexity"? Even though I royally suck at design, possibly I'll try to do
> some sketches of something during the holidays to at least fill the void in
> some presentations.
> Thanks again!
> On Fri, Dec 14, 2012 at 9:20 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>> 1. There are two kinds of limits: per server and overall. The per server
>> limits come from the fact that we use one directory and at least one file
>> per partition-replica. The normal rules of unix filesystem scalability
>> apply. The per server limits can be mitigated by adding more servers. The
>> overall limits mostly come from zookeeper, which we use for partition
>> metadata. Zookeeper is non-partitioned and all in memory, so this probably
>> puts the limit in the millions? These are the fundamental limits. More
>> practically, we don't have regular performance tests for very large numbers
>> of partitions, so it is buyer beware. So I think LinkedIn has something
>> like a few thousand partitions in total. If you have more than that it
>> should theoretically work up to the limits I described but you should try
>> it first--if you uncover issues we are definitely interested in fixing
>> them.
>> 2. We haven't tried to separate out the client from the broker. It is
>> possible, of course, but no one has done it. Can I ask specifically the
>> problem you are interested in solving (fewer dependency conflicts? smaller
>> binary?).
>> 3. The log4j appender relys on the normal scala producer. It is possible to
>> rewrite the producer in java, but it would be some work. This might be a
>> good idea--I agree that the clients should ideally be thin and have
>> few dependencies. The practical problem this introduces is that code
>> sharing becomes a bit trickier. You are correct that the producer should no
>> longer depend on zookeeper.
>> 4. There is no mod_kafka that I know of. There is a console producer that
>> will suck in file input and output kafka messages, which might work for
>> you. mod_kafka would be a pretty sweet project idea.
>> 5. Yes, this is true. We increased the scope of 0.8 quite a bit try to
>> bundle non-compatible changes together. The answer depends on your level of
>> risk tolerance. Right now at LinkedIn we are subjecting 0.8 to a forked
>> version of our production load and we are still finding plenty of issues.
>> We are hoping to get that stable in the next few weeks, and it will likely
>> take several months to completely roll over all applications to 0.8 here.
>> So right now it is probably safe for development only. When we have rolled
>> it out 100% I would feel pretty confident saying it is very solid. In
>> between now and then kind of depends on your risk tolerance. Perhaps one
>> thing we could do is give a little more of an update as this testing
>> progresses. It is obviously hard to give a rigerous schedule since it is
>> mostly unknown unknowns.
>> 6. As of a few days ago svn is used only for the website, and that is only
>> because of a dependence on apache tooling.
>> 7. There hasn't really been much a discussion on Logo, though we definitely
>> need one. I offered to act as "personal programming slave" to any of the
>> LinkedIn designers if they would make us a nice logo. If that approach
>> fails maybe we should just do 99 designs?
>> Cheers,
>> -Jay
>> On Fri, Dec 14, 2012 at 4:42 AM, Johan Lundahl <johan.lundahl@gmail.com
>>> wrote:
>>> Hi,
>>> I'm trying to promote Kafka for our centralized log aggregation/metrics
>>> system and have set up a proof of concept based on 0.7.2 which seems to
>>> work very well for our purposes but the improvements in 0.8 looks too
>>> important for us to go live without them. After studying the presentation
>>> material and videos I have some questions:
>>> 1) It's mentioned by Jay in one of the videos that Kafka is designed for
>> <
>>> 1000 topics. I understand the fundamentals of what a topic is meant to be
>>> but are there any real system limits in regards to this? In our case, we
>>> have around 100 clusters running our different (java only) applications
>>> with a guesstimate average size of 40 nodes each. We have around 30
>>> different types of logs plus some other metrics so this would give us 100
>>> clusters * 35 types = 3500 topics. Furthermore, it's likely that the
>> number
>>> of clusters will increase in the future. Is this something that could
>> cause
>>> us trouble or are this figure of < 1000 topics just a guideline?
>>> 2) The KafkaLog4jAppender is a very convenient way for us to stream our
>>> logs since no changes in the application code will be needed but is it
>>> possible to build a lightweight jar with only the KafkaLog4jAppender
>>> producer that we easily could deploy on our production servers? I'm not
>> an
>>> sbt expert but I could only manage to build a full package including
>> broker
>>> and everything which is a lot heavier.
>>> 3) As our applications are pure java, it would be nice to avoid the scala
>>> runtime on the producer side. Would it be feasible to implement the
>>> KafkaLog4jAppender in java? With 0.8, the dependency on Zookeeper should
>>> not be needed on producer side either if I understand correctly right?
>>> 4) How do you handle non application logs, for example webserver logs? Is
>>> there something like an Apache httpd mod_kafka? OS metrics?
>>> 5) In general I think it's somewhat tricky to follow the status of the
>>> different Kafka versions. It seems like 0.8 has been postponed a bit
>>> relative to original plans but are there newer estimations of when it can
>>> be considered "stable"? Is there a summary of the important changes for
>> the
>>> version?
>>> 6) I've seen a few mails recently about git migration. Would it be enough
>>> to only use git from 0.8 or would I still need svn for anything?
>>> 7) Has there been discussions about creating a logo for Kafka? My
>>> conceptual system diagrams look a bit empty on the Kafka parts in the
>>> promo-slides I've made...(the same thing applies to the Storm parts)
>>> Thanks a lot in advance for your help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message