kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ara Ebrahimi <ara.ebrah...@argyledata.com>
Subject Re: more uniform task assignment across kafka stream nodes
Date Sun, 26 Mar 2017 01:08:01 GMT
Via:

builder.stream("topic1");
builder.stream("topic2");
builder.stream("topic3”);

These are different kinds of topics consuming different avro objects.

Ara.

On Mar 25, 2017, at 6:04 PM, Matthias J. Sax <matthias@confluent.io<mailto:matthias@confluent.io>>
wrote:




________________________________

This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.

________________________________

From: "Matthias J. Sax" <matthias@confluent.io<mailto:matthias@confluent.io>>
Subject: Re: more uniform task assignment across kafka stream nodes
Date: March 25, 2017 at 6:04:30 PM PDT
To: users@kafka.apache.org<mailto:users@kafka.apache.org>
Reply-To: <users@kafka.apache.org<mailto:users@kafka.apache.org>>


Ara,

How do you consume your topics? Via

builder.stream("topic1", "topic2", "topic3);

or via

builder.stream("topic1");
builder.stream("topic2");
builder.stream("topic3");

Both and handled differently with regard to creating tasks (partition to
task assignment also depends on you downstream code though).

If this does not help, can you maybe share the structure of processing?
To dig deeper, we would need to know the topology DAG.


-Matthias


On 3/25/17 5:56 PM, Ara Ebrahimi wrote:
Mathias,

This apparently happens because we have more than 1 source topic. We have 3 source topics
in the same application. So it seems like the task assignment algorithm creates topologies
not for one specific topic at a time but the total partitions across all source topics consumed
in an application instance. Because we have some code dependencies between these 3 source
topics we can’t separate them into 3 applications at this time. Hence the reason I want
to get the task assignment algorithm basically do a uniform and simple task assignment PER
source topic.

Ara.

On Mar 25, 2017, at 5:21 PM, Matthias J. Sax <matthias@confluent.io<mailto:matthias@confluent.io><mailto:matthias@confluent.io>>
wrote:




________________________________

This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.

________________________________

From: "Matthias J. Sax" <matthias@confluent.io<mailto:matthias@confluent.io><mailto:matthias@confluent.io>>
Subject: Re: more uniform task assignment across kafka stream nodes
Date: March 25, 2017 at 5:21:47 PM PDT
To: users@kafka.apache.org<mailto:users@kafka.apache.org><mailto:users@kafka.apache.org>
Reply-To: <users@kafka.apache.org<mailto:users@kafka.apache.org><mailto:users@kafka.apache.org>>


Hi,

I am wondering why this happens in the first place. Streams,
load-balanced over all running instances, and each instance should be
the same number of tasks (and thus partitions) assigned.

What is the overall assignment? Do you have StandyBy tasks configured?
What version do you use?


-Matthias


On 3/24/17 8:09 PM, Ara Ebrahimi wrote:
Hi,

Is there a way to tell kafka streams to uniformly assign partitions across instances? If I
have n kafka streams instances running, I want each to handle EXACTLY 1/nth number of partitions.
No dynamic task assignment logic. Just dumb 1/n assignment.

Here’s our scenario. Lets say we have an “source" topic with 8 partitions. We also have
2 kafka streams instances. Each instances get assigned to handle 4 “source" topic partitions.
BUT then we do a few maps and an aggregate. So data gets shuffled around. The map function
uniformly distributes these across all partitions (I can verify that by looking at the partition
offsets). After the map what I notice by looking at the topology is that one kafka streams
instance get assigned to handle say 2 aggregate repartition topics and the other one gets
assigned 6. Even worse, on bigger clusters (say 4 instances) we see say 2 nodes gets assigned
downstream aggregate repartition topics and 2 other nodes assigned NOTHING to handle.

Ara.



________________________________

This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.

________________________________








________________________________

This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.

________________________________




________________________________

This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Thank you in advance for your cooperation.

________________________________
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message