kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Arthur <mum...@gmail.com>
Subject Re: Producer questions and more
Date Sat, 15 Dec 2012 03:32:57 GMT
7) Something like: http://i.imgur.com/R21iF.png ?

On 12/14/12 10:04 PM, David Arthur wrote:
> 7) Perhaps a dung beetle should be the logo, as featured in The 
> Metamorphosis. Or maybe just a nice stylized version of the word 
> "Kafka" (like Solr and Lucene).
> On 12/14/12 4:58 PM, Johan Lundahl wrote:
>> Thanks for some very helpful answers!
>> 1) Great, our needs are somewhere in the thousands of topics and we could
>> probably scale out the number of servers as needed.
>> 2) The reason I would like to separate out the producer is to have as small
>> and simple a library to integrate in our deployment as possible. Both
>> conflicts and size would be of interest to reduce since our apps are pretty
>> sensitive in practice. The best case would be to only have the
>> KafkaLog4jAppender as one well defined dependency but for now I'll just run
>> it through Proguard I think.
>> 3) I don't know the motifs or state of Jafka (
>> https://github.com/adyliu/jafka) but if that project aims to be protocol
>> compatible, maybe that could be of help.
>> 4) Ah thanks, I didn't even consider the console producer.
>> 5) I'll do some tests with both 0.72 and 0.8 and see what happens. I would
>> need to start working on the integration for real around the end of Jan or
>> beginning of Feb. The most important thing is that the producer does as
>> little impact to our application as possible. On broker and consumer side,
>> it matters less since that will be in prototype mode in the beginning.
>> 6) Very nice
>> 7) What is the reasoning behind the Kafka name? "THE PROCESS", "Kafkaesque
>> complexity"? Even though I royally suck at design, possibly I'll try to do
>> some sketches of something during the holidays to at least fill the void in
>> some presentations.
>> Thanks again!
>> On Fri, Dec 14, 2012 at 9:20 PM, Jay Kreps<jay.kreps@gmail.com>  wrote:
>>> 1. There are two kinds of limits: per server and overall. The per server
>>> limits come from the fact that we use one directory and at least one file
>>> per partition-replica. The normal rules of unix filesystem scalability
>>> apply. The per server limits can be mitigated by adding more servers. The
>>> overall limits mostly come from zookeeper, which we use for partition
>>> metadata. Zookeeper is non-partitioned and all in memory, so this probably
>>> puts the limit in the millions? These are the fundamental limits. More
>>> practically, we don't have regular performance tests for very large numbers
>>> of partitions, so it is buyer beware. So I think LinkedIn has something
>>> like a few thousand partitions in total. If you have more than that it
>>> should theoretically work up to the limits I described but you should try
>>> it first--if you uncover issues we are definitely interested in fixing
>>> them.
>>> 2. We haven't tried to separate out the client from the broker. It is
>>> possible, of course, but no one has done it. Can I ask specifically the
>>> problem you are interested in solving (fewer dependency conflicts? smaller
>>> binary?).
>>> 3. The log4j appender relys on the normal scala producer. It is possible to
>>> rewrite the producer in java, but it would be some work. This might be a
>>> good idea--I agree that the clients should ideally be thin and have
>>> few dependencies. The practical problem this introduces is that code
>>> sharing becomes a bit trickier. You are correct that the producer should no
>>> longer depend on zookeeper.
>>> 4. There is no mod_kafka that I know of. There is a console producer that
>>> will suck in file input and output kafka messages, which might work for
>>> you. mod_kafka would be a pretty sweet project idea.
>>> 5. Yes, this is true. We increased the scope of 0.8 quite a bit try to
>>> bundle non-compatible changes together. The answer depends on your level of
>>> risk tolerance. Right now at LinkedIn we are subjecting 0.8 to a forked
>>> version of our production load and we are still finding plenty of issues.
>>> We are hoping to get that stable in the next few weeks, and it will likely
>>> take several months to completely roll over all applications to 0.8 here.
>>> So right now it is probably safe for development only. When we have rolled
>>> it out 100% I would feel pretty confident saying it is very solid. In
>>> between now and then kind of depends on your risk tolerance. Perhaps one
>>> thing we could do is give a little more of an update as this testing
>>> progresses. It is obviously hard to give a rigerous schedule since it is
>>> mostly unknown unknowns.
>>> 6. As of a few days ago svn is used only for the website, and that is only
>>> because of a dependence on apache tooling.
>>> 7. There hasn't really been much a discussion on Logo, though we definitely
>>> need one. I offered to act as "personal programming slave" to any of the
>>> LinkedIn designers if they would make us a nice logo. If that approach
>>> fails maybe we should just do 99 designs?
>>> Cheers,
>>> -Jay
>>> On Fri, Dec 14, 2012 at 4:42 AM, Johan Lundahl <johan.lundahl@gmail.com
>>>> wrote:
>>>> Hi,
>>>> I'm trying to promote Kafka for our centralized log aggregation/metrics
>>>> system and have set up a proof of concept based on 0.7.2 which seems to
>>>> work very well for our purposes but the improvements in 0.8 looks too
>>>> important for us to go live without them. After studying the presentation
>>>> material and videos I have some questions:
>>>> 1) It's mentioned by Jay in one of the videos that Kafka is designed for
>>> <
>>>> 1000 topics. I understand the fundamentals of what a topic is meant to be
>>>> but are there any real system limits in regards to this? In our case, we
>>>> have around 100 clusters running our different (java only) applications
>>>> with a guesstimate average size of 40 nodes each. We have around 30
>>>> different types of logs plus some other metrics so this would give us 100
>>>> clusters * 35 types = 3500 topics. Furthermore, it's likely that the
>>> number
>>>> of clusters will increase in the future. Is this something that could
>>> cause
>>>> us trouble or are this figure of < 1000 topics just a guideline?
>>>> 2) The KafkaLog4jAppender is a very convenient way for us to stream our
>>>> logs since no changes in the application code will be needed but is it
>>>> possible to build a lightweight jar with only the KafkaLog4jAppender
>>>> producer that we easily could deploy on our production servers? I'm not
>>> an
>>>> sbt expert but I could only manage to build a full package including
>>> broker
>>>> and everything which is a lot heavier.
>>>> 3) As our applications are pure java, it would be nice to avoid the scala
>>>> runtime on the producer side. Would it be feasible to implement the
>>>> KafkaLog4jAppender in java? With 0.8, the dependency on Zookeeper should
>>>> not be needed on producer side either if I understand correctly right?
>>>> 4) How do you handle non application logs, for example webserver logs? Is
>>>> there something like an Apache httpd mod_kafka? OS metrics?
>>>> 5) In general I think it's somewhat tricky to follow the status of the
>>>> different Kafka versions. It seems like 0.8 has been postponed a bit
>>>> relative to original plans but are there newer estimations of when it can
>>>> be considered "stable"? Is there a summary of the important changes for
>>> the
>>>> version?
>>>> 6) I've seen a few mails recently about git migration. Would it be enough
>>>> to only use git from 0.8 or would I still need svn for anything?
>>>> 7) Has there been discussions about creating a logo for Kafka? My
>>>> conceptual system diagrams look a bit empty on the Kafka parts in the
>>>> promo-slides I've made...(the same thing applies to the Storm parts)
>>>> Thanks a lot in advance for your help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message