kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Harris <dhar...@avum.com>
Subject Partitioner question/issue
Date Mon, 22 Oct 2012 17:22:11 GMT
Hi Everyone,

I want to have particular subsets of my data sent to different partitions
so that I can have consumers across different machines (or multiple
instances of the consumers running in different threads) handle the subsets
of data.  The definition of these subsets is important, meaning data of
type 1 needs to go into subset 1 etc.

My set up is that I have kafka (0.7.1) and zookeeper running on a single
machine like described in the quick start guide. In my server.properties
file I’ve set num.partitions=4.

I’m working on testing this all out with a simple class that passes one
character strings [a-z]  as messages, and I have a
kafka.producer.Partitioner that puts a-g, h-m, n-s and t-z into separate
partitions. The issue I’m having is that when running my code for the first
time (i.e.  the topic doesn't exist in kafka yet) I’m seeing that in
the “*public
int partition(String s, int numPartitions)*” method of my Partitioner the
numPartitions is 1 the first few times it is called, then after a while its
coming as 4.  In my example this is causing some w, y, z etc to be included
in the partition with a, b and c’s.   If I’ve already run the code once and
I see the four folders under the /tmp/kafka-logs for my partition
everything works as expected.

I’ve attached my test code that shows this issue. (I believe that
attachments come across, if not I can paste in the body of the email).  I’m
not sure if I’m doing something wrong in the code or if I’m approaching
this problem in the wrong way.  It seems that an alternative approach would
be to have a separate topic for each subset of data, and then have my
producer push to the different topics.  Any advice/suggestions?

On a related note, when reading up about this topic in the quick start
guide I see that it describes creating a ProducerData object like the
*   ProducerData<String, String> data = new ProducerData<String,
String>("test-topic", "test-key", "test-message");*
But looking at the API docs I see that the only constructor that allows me
to specify a key takes a List[v] as the third parameter:
*   ProducerData(topic: String, key: K, data: List[V])*
Am I missing something here?

David Harris

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message