airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eroma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-1721) Experiment failed due to not been able to find the remote jobID for the jobName
Date Thu, 03 Sep 2015 13:50:45 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729063#comment-14729063
] 

Eroma commented on AIRAVATA-1721:
---------------------------------

Even though the experiment fails due to job ID not been found the job exists in the rescue
and its completed and has output files generated as well. This issue take place across compute
resources. 

Have experienced it in stampede, comet and BR2 in https://testdrive.airavata.org/portal/pga/public/
When it is launched the second time almost always it works and completes successfully. This
keeps adoring almost daily basis for at least one experiment but cannot recreate at will.

> Experiment failed due to not been able to find the remote jobID for the jobName
> -------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-1721
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-1721
>             Project: Airavata
>          Issue Type: Bug
>          Components: GFac
>    Affects Versions: 0.15 
>         Environment: http://dev.test-drive.airavata.org/portal/ultrascan-testing
>            Reporter: Eroma
>            Assignee: Shameera Rathnayaka
>
> Steps
> NOTE: The error seem to occur when a batch of experiments are launched to a particular
resource.
> Experiment fails without a job status in PIG with error [1]
> [1]
> expId:SLM1-US-Gordon-06-09_10-11-41_0f63840b-b173-45fb-869a-263c89925241 Couldn't find
remote jobId for JobName:A119993403, both submit and verify steps doesn't return a valid JobId.
Hence changing experiment state to Failed org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /gfac-experiments/gfac-node1/SLM1-US-Gordon-06-09_10-11-41_0f63840b-b173-45fb-869a-263c89925241/org.apache.airavata.gfac.ssh.provider.impl.SSHProvider
org.apache.airavata.gfac.GFacException: Error launching the Job at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.submitJob(BetterGfacImpl.java:480)
at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.submitJob(BetterGfacImpl.java:179) at
org.apache.airavata.gfac.core.utils.InputHandlerWorker.run(InputHandlerWorker.java:47) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.airavata.gfac.GFacException:
KeeperErrorCode = NoNode for /gfac-experiments/gfac-node1/SLM1-US-Gordon-06-09_10-11-41_0f63840b-b173-45fb-869a-263c89925241/org.apache.airavata.gfac.ssh.provider.impl.SSHProvider
at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.launch(BetterGfacImpl.java:718) at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.submitJob(BetterGfacImpl.java:467)
... 5 more Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
= NoNode for /gfac-experiments/gfac-node1/SLM1-US-Gordon-06-09_10-11-41_0f63840b-b173-45fb-869a-263c89925241/org.apache.airavata.gfac.ssh.provider.impl.SSHProvider
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778) at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:696)
at org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:679)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:676)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:453)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:443)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:251)
at org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:205)
at org.apache.airavata.gfac.core.utils.GFacUtils.createHandlerZnode(GFacUtils.java:346) at
org.apache.airavata.gfac.core.utils.GFacUtils.updateHandlerState(GFacUtils.java:376) at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.invokeProviderExecute(BetterGfacImpl.java:730)
at org.apache.airavata.gfac.core.cpi.BetterGfacImpl.launch(BetterGfacImpl.java:678) ... 6
more 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message