hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Filipiak (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-9670) Missing Fsync for localized resostatestoreurces before updating to finalized in
Date Thu, 11 Jul 2019 12:21:00 GMT
Jan Filipiak created YARN-9670:
----------------------------------

             Summary: Missing Fsync for localized resostatestoreurces before updating to finalized
in 
                 Key: YARN-9670
                 URL: https://issues.apache.org/jira/browse/YARN-9670
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 2.6.0
            Reporter: Jan Filipiak


A resource that was localized is not properly FSynced before the state-manager is updated
to track this resource as finalized. The Download is currently considered finished after the
target local outputstream is closed. The data might not have made it to the blockdevice before
the statestore is updated. Containers relying on the resource may see only parts of the resource
after recovery usually leading to them crashing.

 

Possible fixes:

Introduce a new step in the state machine that Fsyncs the downloaded path before calling the
statestore.

On recovery we can compare the size (and we probably have to unpack archives again)

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Mime
View raw message