kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrienne Kole <adrienneko...@gmail.com>
Subject Re: Plans to extend streams?
Date Wed, 29 Nov 2017 16:26:42 GMT

You misunderstood the focus of the post perhaps or I could not explain
properly. I am not claiming the streams is limited to single node.
Although the whole topology instance can be limited to a single node (each
node run all topology), this is sth else.
Also, I think that "moving 100s of GB data per day" claim is orthogonal
and as this is not big/fast/ enough to reason.

The thing is that, for some use-cases streams-kafka-streams connection can
be a bottleneck.  Yes, if I have 40GB/s or infiniband network bandwidth
this might not be an issue.

Consider a simple topology with operators A>B->C. (B forces to re-partition)
 Streams nodes are s1(A), s2 (B,C) and kafka resides on cluster k, which
might be in different network switch.
So, rather than transferring data k->s1->s2, we make a round trip
k->s1->k->s2. If we know that s1 and s2 are in the same network and data
transfer is fast between two, we should not go through another intermediate


On Wed, Nov 29, 2017 at 4:52 PM, Jan Filipiak <Jan.Filipiak@trivago.com>

> Hey,
> you making some wrong assumptions here.
> Kafka Streams is in no way single threaded or
> limited to one physical instance.
> Having connectivity issues to your brokers is IMO
> a problem with the deployment and not at all
> with how kafka streams is designed and works.
> Kafka Streams moves hundreds of GB per day for us.
> Hope this helps.
> Best Jan
> On 29.11.2017 15:10, Adrienne Kole wrote:
>> Hi,
>> The purpose of this email is to get overall intuition for the future
>> plans
>> of streams library.
>> The main question is that, will it be a single threaded application in the
>> long run and serve microservices use-cases, or are there any plans to
>> extend it to multi-node execution framework with less kafka dependency.
>> Currently, each streams node 'talks' with kafka cluster and they can
>> indirectly talk with each other again through kafka. However, especially
>> if
>> kafka is not in the same network with streams nodes (actually this can
>> happen if they are in the same network as well) this will cause high
>> network overhead and inefficiency.
>> One solution for this (bypassing network overhead) is to deploy streams
>> node on kafka cluster to ensure the data locality. However, this is not
>> recommended as the library and kafka can affect each other's performance
>> and  streams does not necessarily have to know the internal data
>> partitioning of kafka.
>> Another solution would be extending streams library to have a common
>> runtime. IMO, preserving the current selling points of streams (like
>> dynamic scale in/out) with this kind of extensions can be very good
>> improvement.
>> So my question is that, will streams in the long/short run, will extend
>> its
>> use-cases to massive and efficient stream processing (and compete with
>> spark) or stay and strengthen its current position?
>> Cheers,
>> Adrienne

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message