flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5778) Split FileStateHandle into fileName and basePath
Date Wed, 01 Mar 2017 14:02:45 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890214#comment-15890214
] 

ASF GitHub Bot commented on FLINK-5778:
---------------------------------------

GitHub user uce opened a pull request:

    https://github.com/apache/flink/pull/3442

    [FLINK-5778] [savepoints] Add savepoint serializer with relative file path serialization

    This adds a new savepoint version, `SavepointV2`. The corresponding `SavepointV2Serializer`
is the same as our current `SavepointV1Serializer` except that `FileStateHandle` instances
are serialized with their file path relative to the savepoint base path.
    
    As an example imagine a savepoint in directory `hdfs:///path/to/savepoint-directory` with
this data file:
    
    ```
    hdfs:///path/to/savepoint-directory/_metadata
    hdfs:///path/to/savepoint-directory/data-X
    hdfs:///path/to/savepoint-directory/data-Y
    ```
    
    Previously, the complete file path was stored. With this PR, we only store `data-X` for
file state handles and reconstruct the complete path from the savepoint directory on restore.
This enables us to move the savepoint directory around. The only requirement is that the layout
within the savepoint directory does not change. I think this is a reasonable restriction.
    
    In addition to the added tests, I've tested this manually by triggering savepoints, moving
the savepoint around in the local file system as well as to HDFS and restoring from it. 
    
    The code between `SavepointV1` and `SavepointV2` and the respective serializers is mostly
shared. Therefore, I've moved the base logic out to an abstract `AbstractSavepoint` and `AbstractSavepointSerializer`.
    
    The migration story is that you can resume old savepoints as before and all newly triggered
savepoints will be V2 savepoints that serialize file state handles with their relative path.
You can also resume with `1.3-SNAPSHOT` savepoint without any issues.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uce/flink 5778-relocatable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3442
    
----
commit 1bc3b3bff1b33eb204e8b9d4cd9589105dd60466
Author: Ufuk Celebi <uce@apache.org>
Date:   2017-02-28T21:36:24Z

    [FLINK-5778] [savepoints] Add savepoint serializer with relative file path serializaton

----


> Split FileStateHandle into fileName and basePath
> ------------------------------------------------
>
>                 Key: FLINK-5778
>                 URL: https://issues.apache.org/jira/browse/FLINK-5778
>             Project: Flink
>          Issue Type: Sub-task
>          Components: State Backends, Checkpointing
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> Store the statePath as a basePath and a fileName and allow to overwrite the basePath.
We cannot overwrite the base path as long as the state handle is still in flight and not persisted.
Otherwise we risk a resource leak.
> We need this in order to be able to relocate savepoints.
> {code}
> interface RelativeBaseLocationStreamStateHandle {
>    void clearBaseLocation();
>    void setBaseLocation(String baseLocation);
> }
> {code}
> FileStateHandle should implement this and the SavepointSerializer should forward the
calls when a savepoint is stored or loaded, clear before store and set after load.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message