flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fhueske <...@git.apache.org>
Subject [GitHub] incubator-flink pull request: Change Partition Operator to actual ...
Date Fri, 26 Sep 2014 11:54:26 GMT
Github user fhueske commented on the pull request:

    The DOP of the partition operator needs to be explicitly set to the DOP of the receiving
task. Otherwise, the data is shuffled again.
    I'm pretty sure this behavior is never wanted and think it opens a potential trap. Also
multiple successors with different DOPs might cause problems.
    These were exactly the cases, I tried to avoid with my implementation (+ repartitioning
where it does not make any sense).
    The way it is done in this PR, makes things more explicit and controlable for the user.
A user can do more stuff but also a lot of very stupid things.
    I would prefer the safer alternative, but won't veto if others find this is a better solution.
    However, if we go with the PR, I vote to make the risk of this operator mcuh more clear
in the JavaDocs and the documentation.
    We could also include the DOP as an additional parameter to rebalance() and partitionByHash()
and deactivate the setParallelism() to make clear that the DOP is very important for this
operator (-1 could be used for the default parallelsim).

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message