flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4853) Clean up JobManager registration at the ResourceManager
Date Fri, 21 Oct 2016 15:35:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595447#comment-15595447
] 

ASF GitHub Bot commented on FLINK-4853:
---------------------------------------

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2657#discussion_r84500048
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java
---
    @@ -202,101 +205,125 @@ public void shutDown() throws Exception {
     	//  RPC methods
     	// ------------------------------------------------------------------------
     
    -	/**
    -	 * Register a {@link JobMaster} at the resource manager.
    -	 *
    -	 * @param resourceManagerLeaderId The fencing token for the ResourceManager leader
    -	 * @param jobMasterAddress        The address of the JobMaster that registers
    -	 * @param jobID                   The Job ID of the JobMaster that registers
    -	 * @return Future registration response
    -	 */
     	@RpcMethod
    -	public Future<RegistrationResponse> registerJobMaster(
    -		final UUID resourceManagerLeaderId, final UUID jobMasterLeaderId,
    -		final String jobMasterAddress, final JobID jobID) {
    +	public Future<RegistrationResponse> registerJobManager(
    +			final UUID resourceManagerLeaderId,
    +			final UUID jobManagerLeaderId,
    +			final String jobManagerAddress,
    +			final JobID jobId) {
    +
    +		checkNotNull(resourceManagerLeaderId);
    +		checkNotNull(jobManagerLeaderId);
    +		checkNotNull(jobManagerAddress);
    +		checkNotNull(jobId);
    +
    +		if (isValid(resourceManagerLeaderId)) {
    +			if (!jobLeaderIdService.containsJob(jobId)) {
    +				try {
    +					jobLeaderIdService.addJob(jobId);
    +				} catch (Exception e) {
    +					// This should actually never happen because, it should always be possible to add
a new job
    +					ResourceManagerException exception = new ResourceManagerException("Could not add
the job " +
    --- End diff --
    
    Actually, this might happen when the leader id service fails to start. It could be temporary
and we might have to introduce some sort of retry rule here. Not in the scope of this PR though.


> Clean up JobManager registration at the ResourceManager
> -------------------------------------------------------
>
>                 Key: FLINK-4853
>                 URL: https://issues.apache.org/jira/browse/FLINK-4853
>             Project: Flink
>          Issue Type: Sub-task
>          Components: ResourceManager
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The current {{JobManager}} registration at the {{ResourceManager}} blocks threads in
the {{RpcService.execute}} pool. This is not ideal and can be avoided by not waiting on a
{{Future}} in this call.
> I propose to encapsulate the leader id retrieval operation in a distinct service so that
it can be separated from the {{ResourceManager}}. This will reduce the complexity of the {{ResourceManager}}
and make the individual components easier to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message