flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Wang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-6020) Blob Server cannot hanlde multiple job sumits(with same content) parallelly
Date Fri, 10 Mar 2017 08:33:04 GMT
Tao Wang created FLINK-6020:
-------------------------------

             Summary: Blob Server cannot hanlde multiple job sumits(with same content) parallelly
                 Key: FLINK-6020
                 URL: https://issues.apache.org/jira/browse/FLINK-6020
             Project: Flink
          Issue Type: Bug
            Reporter: Tao Wang
            Priority: Critical


In yarn-cluster mode, if we submit one same job multiple times parallelly, the task will encounter
class load problem and lease occuputation.

Because blob server stores user jars in name with generated sha1sum of those, first writes
a temp file and move it to finalialize. For recovery it also will put them to HDFS with same
file name.

In same time, when multiple clients sumit same job with same jar, the local jar files in blob
server and those file on hdfs will be handled in multiple threads(BlobServerConnection), and
impact each other.

It's better to have a way to handle this, now two ideas comes up to my head:
1. lock the write operation, or
2. use some unique identifier as file name instead of ( or added up to) sha1sum of the file
contents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message