airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimuthu Upeksha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-2736) Job submitted and running in HPC while the experiment is tagged as FAILED
Date Tue, 10 Apr 2018 20:52:00 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432957#comment-16432957
] 

Dimuthu Upeksha commented on AIRAVATA-2736:
-------------------------------------------

Fixed in https://github.com/apache/airavata/commit/1b950bdb5b96f046e4fbaac6e7024b158dd86e7a

> Job submitted and running in HPC while the experiment is tagged as FAILED
> -------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2736
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2736
>             Project: Airavata
>          Issue Type: Bug
>          Components: helix implementation
>    Affects Versions: 0.18
>         Environment: http://149.165.168.248:8008/ - Helix test env
>            Reporter: Eroma
>            Assignee: Dimuthu Upeksha
>            Priority: Major
>             Fix For: 0.18
>
>
> # Submitted an experiment which then submitted the job.
>  # Job ID is returned and the status is ACTIVE.
>  # Due to zookeeper connection issue the experiment is FAILED.
>  # The job is still running in HPC
>  # Airavata is not waiting for job monitoring as the task status is not updated in the
zookeeper.
>  # error in log [1]
>  # SLM001-AmberSander-BR2_5ed5a19f-ab44-4eba-afb7-1feafaf0bbdd - exp ID
> [1]
> |org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /monitoring/2159926/lock at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:696)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:679)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:676)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
at org.apache.airavata.helix.impl.task.submission.JobSubmissionTask.createMonitoringNode(JobSubmissionTask.java:83)
at org.apache.airavata.helix.impl.task.submission.DefaultJobSubmissionTask.onRun(DefaultJobSubmissionTask.java:144)
at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:264) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:74)
at org.apache.helix.task.TaskRunner.run(TaskRunner.java:70) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message