flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Forward Partitioning & same Parallelism: 1:1 communication?
Date Wed, 12 Aug 2015 09:50:03 GMT
Thanks :) Regarding your answer to Nica: I didn't mean to say that it was
too generic or anything... it was very nice. I was just curious, that's why
I asked.

On Wed, Aug 12, 2015 at 11:45 AM, Márton Balassi <balassi.marton@gmail.com>
wrote:

> Hey Ufuk,
>
> The shipping strategy name forward is shared between batch and streaming
> and Nica did not specify either API, so I tried to give a generic answer.
>
> I assume that your question is specifically for streaming, in that case:
> Yes, streaming is using the pointwise distribution pattern. [1]
> Unfortunately your concern is true, currently streaming would leave extra
> downstream operator instances idle, but Aljoscha has an open pull request
> fixing this issue amongst others. See the discussion here. [2]
>
> [1]
> https://github.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L320
> [2] https://github.com/apache/flink/pull/988
>
> Cheers,
>
> Marton
>
> On Wed, Aug 12, 2015 at 11:33 AM, Ufuk Celebi <ufuk@data-artisans.com>
> wrote:
>
>> Hey Marton,
>>
>> out of curiosity: is this using Flink’s “point” connections underneath or
>> is there some custom logic for streaming jobs?
>>
>> What happens if operator B has 2 times the parallelism of operator A? For
>> example if there were parallel tasks A1 and A2 and B1-B4: would A1 send to
>> B1 *and* B2 or just B1?
>>
>> – Ufuk
>>
>> On 12 Aug 2015, at 10:39, Márton Balassi <balassi.marton@gmail.com>
>> wrote:
>>
>> Dear Nica,
>>
>> Yes, forward partitioning means that if subsequent operators share
>> parallelism then the output of an upstream operator is sent to exactly
>> one downstream operator. This makes sense for operators working on
>> individual records, e.g. a typical map-filter pair, because as a
>> consequence Flink may be able to collocate these operator pairs on the same
>> physical machine.
>>
>> Best,
>>
>> Marton
>>
>> On Tue, Aug 11, 2015 at 11:41 PM, Nicaz <Walteran@students.uni-marburg.de
>> > wrote:
>> Hello,
>>
>> I have a question about forward partitioning in Flink.
>>
>> If Operator A and Operator B have the same parallelism set and forward
>> partitioning is used for events coming from instances of A and going to
>> instances of B:
>>
>> Will each instance of A send events to _exactly one_ instance of B?
>>
>> That is, will all events coming from a specific instance of A go to the
>> _same_ specific instance of B, and will _all_ instances of B be used?
>> Or are there any situations where an instance of A will distribute events
>> to
>> several different instances of B, or where two instances of A will send
>> events to the same instance of B (possibly leaving some other instance of
>> B
>> unused)?
>>
>> I'd be very happy if someone were able to shed some light on this issue.
>> :-)
>>
>> Thanks in advance
>> Nica
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Forward-Partitioning-same-Parallelism-1-1-communication-tp2373.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>>
>>
>

Mime
View raw message