airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimuthu Upeksha (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (AIRAVATA-2831) Experiment FAILED with an error on output file staging! But the file referring in the error is actually downloaded and available in storage.
Date Fri, 21 Sep 2018 17:46:00 GMT

     [ https://issues.apache.org/jira/browse/AIRAVATA-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dimuthu Upeksha resolved AIRAVATA-2831.
---------------------------------------
    Resolution: Fixed

This should be fixed after data staging retrying implementation

> Experiment FAILED with an error on output file staging! But the file referring in the
error is actually downloaded and available in storage.
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2831
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2831
>             Project: Airavata
>          Issue Type: Bug
>          Components: helix implementation
>    Affects Versions: 0.18
>         Environment: https://staging.seagrid.org/
>            Reporter: Eroma
>            Assignee: Dimuthu Upeksha
>            Priority: Major
>             Fix For: 0.18
>
>
> # When experiments were launched and jobs were submitted bot real time monitoring and
email monitoring was stopped.
>  # Started realtime monitoring and then the job statuses got updated correctly.
>  # Then stopped the realtime monitoring and started email monitoing.
>  # Job statuses got updated correctly but experiment status of some are FAILED with error
[1]
>  # But the file is already transfered.
>  # exp ID: SLM005-QEspresso-JS:2_1fec2375-945b-4b21-8157-5e91b1391312 and job iD: 237.torque-server
> [1]
> |org.apache.airavata.helix.impl.task.TaskOnFailException: Error Code : 01ee4646-2139-40b8-840e-348e37b1823f,
Task TASK_f5726ea4-638f-4c41-9904-0b3c766fcaee failed due to Error while checking the file
/N/SEAGrid_scratch//PROCESS_f0192239-787a-4f8f-b63e-7cb45a837f4a/Quantum_Espresso.stdout existence,
net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive
response for 25 seconds at org.apache.airavata.helix.impl.task.AiravataTask.onFail(AiravataTask.java:102)
at org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:187)
at org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:311) at org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:90)
at org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.airavata.agents.api.AgentException:
net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST] Did not receive any keep-alive
response for 25 seconds at org.apache.airavata.helix.adaptor.SSHJAgentAdaptor.doesFileExist(SSHJAgentAdaptor.java:183)
at org.apache.airavata.helix.impl.task.staging.DataStagingTask.transferFileToStorage(DataStagingTask.java:141)
at org.apache.airavata.helix.impl.task.staging.OutputDataStagingTask.onRun(OutputDataStagingTask.java:172)
... 10 more Caused by: net.schmizz.sshj.connection.ConnectionException: [CONNECTION_LOST]
Did not receive any keep-alive response for 25 seconds at net.schmizz.keepalive.KeepAliveRunner.checkMaxReached(KeepAliveRunner.java:64)
at net.schmizz.keepalive.KeepAliveRunner.doKeepAlive(KeepAliveRunner.java:56) at net.schmizz.keepalive.KeepAlive.run(KeepAlive.java:63)|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message