kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrienne Kole <adrienneko...@gmail.com>
Subject Re: Plans to extend streams?
Date Wed, 29 Nov 2017 16:26:42 GMT
Hi,

You misunderstood the focus of the post perhaps or I could not explain
properly. I am not claiming the streams is limited to single node.
Although the whole topology instance can be limited to a single node (each
node run all topology), this is sth else.
Also, I think that "moving 100s of GB data per day" claim is orthogonal
and as this is not big/fast/ enough to reason.

The thing is that, for some use-cases streams-kafka-streams connection can
be a bottleneck.  Yes, if I have 40GB/s or infiniband network bandwidth
this might not be an issue.

Consider a simple topology with operators A>B->C. (B forces to re-partition)
 Streams nodes are s1(A), s2 (B,C) and kafka resides on cluster k, which
might be in different network switch.
So, rather than transferring data k->s1->s2, we make a round trip
k->s1->k->s2. If we know that s1 and s2 are in the same network and data
transfer is fast between two, we should not go through another intermediate
layer.


Thanks.



On Wed, Nov 29, 2017 at 4:52 PM, Jan Filipiak <Jan.Filipiak@trivago.com>
wrote:

> Hey,
>
> you making some wrong assumptions here.
> Kafka Streams is in no way single threaded or
> limited to one physical instance.
> Having connectivity issues to your brokers is IMO
> a problem with the deployment and not at all
> with how kafka streams is designed and works.
>
> Kafka Streams moves hundreds of GB per day for us.
>
> Hope this helps.
>
> Best Jan
>
>
>
> On 29.11.2017 15:10, Adrienne Kole wrote:
>
>> Hi,
>>
>> The purpose of this email is to get overall intuition for the future
>> plans
>> of streams library.
>>
>> The main question is that, will it be a single threaded application in the
>> long run and serve microservices use-cases, or are there any plans to
>> extend it to multi-node execution framework with less kafka dependency.
>>
>> Currently, each streams node 'talks' with kafka cluster and they can
>> indirectly talk with each other again through kafka. However, especially
>> if
>> kafka is not in the same network with streams nodes (actually this can
>> happen if they are in the same network as well) this will cause high
>> network overhead and inefficiency.
>>
>> One solution for this (bypassing network overhead) is to deploy streams
>> node on kafka cluster to ensure the data locality. However, this is not
>> recommended as the library and kafka can affect each other's performance
>> and  streams does not necessarily have to know the internal data
>> partitioning of kafka.
>>
>> Another solution would be extending streams library to have a common
>> runtime. IMO, preserving the current selling points of streams (like
>> dynamic scale in/out) with this kind of extensions can be very good
>> improvement.
>>
>> So my question is that, will streams in the long/short run, will extend
>> its
>> use-cases to massive and efficient stream processing (and compete with
>> spark) or stay and strengthen its current position?
>>
>> Cheers,
>> Adrienne
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message