flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [flink] xintongsong commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration
Date Wed, 08 May 2019 13:48:47 GMT
xintongsong commented on issue #8303: [FLINK-12343]add file replication config for yarn configuration
URL: https://github.com/apache/flink/pull/8303#issuecomment-490492735
 
 
   @rmetzger 
   
   > If it happens asynchronously, we might run into a situation where the files are not
yet replicated, and the deployment of the YARN cluster won't benefit from a higher replication.
   
   Yes, the situation is possible.
   
   However, I would like to point out that if there is lots of files that needs to be uploaded,
the performance difference could be significant.
   
   Besides, there is a chance that the dfs finishes duplicating replicas during Flink allocating
resources from YARN resource manager and launching containers. Of course it's not guaranteed.
Based on our experience of MapReduce (10k+ containers, 10 replicas), the duplicating finishes
before containers start localizing in most cases. 
   
   I think which option performs better really depends on the specific scenario. +1 on validating
through tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message