flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chesnay Schepler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1927) [Py] Rework operator distribution
Date Wed, 22 Apr 2015 10:33:59 GMT
Chesnay Schepler created FLINK-1927:

             Summary: [Py] Rework operator distribution
                 Key: FLINK-1927
                 URL: https://issues.apache.org/jira/browse/FLINK-1927
             Project: Flink
          Issue Type: Improvement
          Components: Python API
    Affects Versions: 0.9
            Reporter: Chesnay Schepler
            Assignee: Chesnay Schepler
            Priority: Minor
             Fix For: 0.9

Currently, the python operator is created when execution the python plan file, serialized
using dill and saved as a byte[] in the java function. It is then deserialized at runtime
on each node.

The current implementation is fairly hacky, and imposes certain limitations that make it hard
to work with. Chaining, or generally saving other user-code, always requires a separate deserialization
step after deserializing the operator.

These issues can be easily circumvented by rebuilding the (python) plan on each node, instead
of serializing the operator. The plan creation is deterministic, and every operator is uniquely
identified by an ID that is already known to the java function.

This change will allow us to easily support custom serializers.

This message was sent by Atlassian JIRA

View raw message