airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eroma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRAVATA-2941) Experiments fail to submit jobs to HPC cluster queues due to queue reaching the max job limit per user.
Date Mon, 12 Nov 2018 21:15:00 GMT
Eroma created AIRAVATA-2941:
-------------------------------

             Summary: Experiments fail to submit jobs to HPC cluster queues due to queue reaching
the max job limit per user.
                 Key: AIRAVATA-2941
                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2941
             Project: Airavata
          Issue Type: Bug
          Components: GFac, helix implementation
    Affects Versions: 0.18
         Environment: https://staging.ultrascan.scigap.org & https://ultrascan.scigap.org/

            Reporter: Eroma
            Assignee: Dimuthu Upeksha
             Fix For: 0.18


Currently experiments fail when
 # HPC queue reaches the max job number for the queue.
 # When the job submission fails and HPC sent job submission response [1]airavata tags the
experiment as FAILED.
 # The only option for gateway user is to submit the experiment again.

Fix required is to Airavata to have internal queues or a way to manage such experiments until
the HPC queue is available for jobs and not to FAIL the experiment.

 

[1]

This example os from stampede2

----------------------------------------------------------------- Welcome to the Stampede2
Supercomputer ----------------------------------------------------------------- No reservation
for this job --> Verifying valid submit host (login3)...OK --> Verifying valid jobname...OK
--> Enforcing max jobs per user...FAILED [*] Too many simultaneous jobs in queue. -->
Max job limits for us3 = 50 jobs

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message