kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Harris <dhar...@avum.com>
Subject Re: Partitioner question/issue
Date Mon, 22 Oct 2012 19:21:50 GMT
That is helpful, thank you.

And thanks for all the good work you guys have been doing on Kafka, I think
its a great piece of software.

David Harris


On Mon, Oct 22, 2012 at 1:19 PM, Neha Narkhede <neha.narkhede@gmail.com>wrote:

> You are hitting https://issues.apache.org/jira/browse/KAFKA-278
>
> The partitioning semantics that Kafka 0.7.x provides is sort of weak
> for it to be used for sticky partitioning features like you need here.
> The reason is that partitions hosted on a broker can go offline when a
> broker goes offline.
> This means that during this downtime, the data meant for the subsets
> hosted on those partitions will be sent to other subsets on the fly,
> or they will be dropped (depending on how you configure the
> partitioner).
>
> In Kafka 0.8, partitions are always available in spite of individual
> broker failures. These strong durability guarantees enables sticky
> partitioning to work as expected.
>
> Hope that helps,
> Neha
>
> On Mon, Oct 22, 2012 at 10:22 AM, David Harris <dharris@avum.com> wrote:
> > Hi Everyone,
> >
> > I want to have particular subsets of my data sent to different
> partitions so
> > that I can have consumers across different machines (or multiple
> instances
> > of the consumers running in different threads) handle the subsets of
> data.
> > The definition of these subsets is important, meaning data of type 1
> needs
> > to go into subset 1 etc.
> >
> > My set up is that I have kafka (0.7.1) and zookeeper running on a single
> > machine like described in the quick start guide. In my server.properties
> > file I’ve set num.partitions=4.
> >
> > I’m working on testing this all out with a simple class that passes one
> > character strings [a-z]  as messages, and I have a
> > kafka.producer.Partitioner that puts a-g, h-m, n-s and t-z into separate
> > partitions. The issue I’m having is that when running my code for the
> first
> > time (i.e.  the topic doesn't exist in kafka yet) I’m seeing that in the
> > “public int partition(String s, int numPartitions)” method of my
> Partitioner
> > the numPartitions is 1 the first few times it is called, then after a
> while
> > its coming as 4.  In my example this is causing some w, y, z etc to be
> > included in the partition with a, b and c’s.   If I’ve already run the
> code
> > once and I see the four folders under the /tmp/kafka-logs for my
> partition
> > everything works as expected.
> >
> > I’ve attached my test code that shows this issue. (I believe that
> > attachments come across, if not I can paste in the body of the email).
>  I’m
> > not sure if I’m doing something wrong in the code or if I’m approaching
> this
> > problem in the wrong way.  It seems that an alternative approach would
> be to
> > have a separate topic for each subset of data, and then have my producer
> > push to the different topics.  Any advice/suggestions?
> >
> > On a related note, when reading up about this topic in the quick start
> guide
> > I see that it describes creating a ProducerData object like the
> following:
> >    ProducerData<String, String> data = new ProducerData<String,
> > String>("test-topic", "test-key", "test-message");
> > But looking at the API docs I see that the only constructor that allows
> me
> > to specify a key takes a List[v] as the third parameter:
> >    ProducerData(topic: String, key: K, data: List[V])
> > Am I missing something here?
> >
> > Thanks
> > David Harris
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message