flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1672) Refactor task registration/unregistration
Date Mon, 04 May 2015 13:25:07 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526619#comment-14526619
] 

ASF GitHub Bot commented on FLINK-1672:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/646

    [FLINK-1672] [runtime] Unify Task and RuntimeEnvironment into one class.

     - This simplifies and hardens the failure handling during task startup
     - Guarantees that no actor system threads are blocked by task bootstrap, or task canceling
     - Corrects some previously erroneous corner case state transitions
     - Adds simple and robust tests
    
    This build on top of pull request #645 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink task_deployment

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/646.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #646
    
----
commit 76d63d5b03643d3c6e51a882eb9427e421806342
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-04-30T20:05:27Z

    [streaming] New Source and state checkpointing interfaces that allow operations to interact
with the state checkpointing in a more precise manner.

commit 62e35a3e2322e6b66aba336319e5671d3cfaa2d1
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-05-02T23:15:39Z

    [FLINK-1968] [runtime] Clean up and improve the distributed cache.
    
     - Gives a proper exception when a non-cached file is accessed
     - Forwards I/O exceptions that happen during file transfer, rather than only returning
null when transfer failed
     - Consistently keeps reference counts and copies only when needed
     - Properly removes all files when shutdown
     - Uses a shutdown hook to remove files when process is killed

commit f35b35a30cba516f91927b876623358c4a5976dc
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-05-02T23:57:37Z

    [runtime] Fix TaskExecutionState against non-serializable exceptions.

commit fa6dcac10c38d8efcbab31ea56ce7b2dfabeb30a
Author: Stephan Ewen <sewen@apache.org>
Date:   2015-05-03T02:41:03Z

    [FLINK-1672] [runtime] Unify Task and RuntimeEnvironment into one class.
    
     - This simplifies and hardens the failure handling during task startup
     - Guarantees that no actor system threads are blocked by task bootstrap, or task canceling
     - Corrects some previously erroneous corner case state transitions
     - Adds simple and robust tests

----


> Refactor task registration/unregistration
> -----------------------------------------
>
>                 Key: FLINK-1672
>                 URL: https://issues.apache.org/jira/browse/FLINK-1672
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>
> h4. Current control flow for task registrations
> # JM submits a TaskDeploymentDescriptor to a TM
> ## TM registers the required JAR files with the LibraryCacheManager and returns the user
code class loader
> ## TM creates a Task instance and registers the task in the runningTasks map
> ## TM creates a TaskInputSplitProvider
> ## TM creates a RuntimeEnvironment and sets it as the environment for the task
> ## TM registers the task with the network environment
> ## TM sends async msg to profiler to monitor tasks
> ## TM creates temporary files in file cache
> ## TM tries to start the task
> If any operation >= 1.2 fails:
> * TM calls task.failExternally()
> * TM removes temporary files from file cache
> * TM unregisters the task from the network environment
> * TM sends async msg to profiler to unmonitor tasks
> * TM calls unregisterMemoryManager on task
> If 1.1 fails, only unregister from LibraryCacheManager.
> h4. RuntimeEnvironment, Task, TaskManager separation
> The RuntimeEnvironment has references to certain components of the task manager like
memory manager, which are accecssed from the task. Furthermore it implements Runnable, and
creates the executing task Thread. The Task instance essentially wraps the RuntimeEnvironment
and allows asynchronous state management of the task (RUNNING, FINISHED, etc.).
> The way that the state updates affect the task is not that obvious: state changes trigger
messages to the TM, which for final states further trigger a msg to unregister the task. The
way that tasks are unregistered again depends on the state of the task.
> ----
> I would propose to refactor this to make the way the state handling/registration/unregistration
is handled is more transparent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message