airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Christie (Jira)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-3236) BUG: Experiment launch failed due to non existing job submission interface, but not communicated to the gateway
Date Thu, 27 Aug 2020 13:34:00 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185848#comment-17185848
] 

Marcus Christie commented on AIRAVATA-3236:
-------------------------------------------

 This is the experiment id for Eroma's experiment mentioned above: Gaussian_on_Aug_26,_2020_1:19_PM_c8a1c347-bee7-40ab-8f8c-2b437dea77c9

The experiment has a FAILED status:
{code}
        {
            "state": 8,
            "timeOfStateChange": "2020-08-26T17:20:11.161000Z",
            "reason": "Unexpected error occurred: Error during creating process",
            "statusId": "EXPERIMENT_STATE_71140eb2-6924-4ed8-9337-a7915269155e"
        }
{code}

But it has no errors. We could certain display the status reason. But perhaps the Orchestrator
should also add an experiment error to the experiment when these sort of failures occur.

> BUG: Experiment launch failed due to non existing job submission interface, but not communicated
to the gateway 
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRAVATA-3236
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-3236
>             Project: Airavata
>          Issue Type: Sub-task
>          Components: Django Portal
>    Affects Versions: 0.18
>         Environment: https://distantreader.scigap.org/
>            Reporter: Eroma
>            Assignee: Marcus Christie
>            Priority: Minor
>             Fix For: 0.19
>
>
> 1. Saved and launched an experiment but with compute resource configuration which was
missing job submission interface.
> 2. Th experiment remained in CREATED but it failed in the back end with API server errors
[1].
> 3. Such failures need to be propagated to the Django portal.
> [1]
> 2019-10-18 15:00:25,520 [pool-31-thread-109] ERROR o.a.a.o.s.OrchestratorServerHandler
experiment_id=File_to_Study_Carrel_on_Oct_18,_2019_10:56_AM_107d3099-909d-45b6-a97d-bfab1e55f077,
gateway_id=distantr - Experiment launch failed due to Thrift conversion error, experimentId:
File_to_Study_Carrel_on_Oct_18,_2019_10:56_AM_107d3099-909d-45b6-a97d-bfab1e55f077, gatewayId:
distantr
> org.apache.thrift.TException: Experiment ‘File_to_Study_Carrel_on_Oct_18,_2019_10:56_AM_107d3099-909d-45b6-a97d-bfab1e55f077’
launch failed. Unable to figureout execution type for application File_to_Study_Carrel_1d0b43bf-093e-4edf-9d88-71ec732355b6
>        at org.apache.airavata.orchestrator.server.OrchestratorServerHandler.launchExperiment(OrchestratorServerHandler.java:263)
>        at org.apache.airavata.orchestrator.server.OrchestratorServerHandler.launchExperiment(OrchestratorServerHandler.java:723)
>        at org.apache.airavata.orchestrator.server.OrchestratorServerHandler.access$500(OrchestratorServerHandler.java:77)
>        at org.apache.airavata.orchestrator.server.OrchestratorServerHandler$ExperimentHandler.onMessage(OrchestratorServerHandler.java:677)
>        at org.apache.airavata.messaging.core.impl.ExperimentConsumer.handleDelivery(ExperimentConsumer.java:84)
>        at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
>        at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>        at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.airavata.orchestrator.core.exception.OrchestratorException: Error
during creating process
>        at org.apache.airavata.orchestrator.cpi.impl.SimpleOrchestratorImpl.createAndSaveTasks(SimpleOrchestratorImpl.java:358)
>        at org.apache.airavata.orchestrator.server.OrchestratorServerHandler.launchExperiment(OrchestratorServerHandler.java:231)
>        ... 9 common frames omitted
> Caused by: org.apache.airavata.orchestrator.core.exception.OrchestratorException: Error
occurred while retrieving data from app catalog
>        at org.apache.airavata.orchestrator.core.utils.OrchestratorUtils.getPreferredJobSubmissionInterface(OrchestratorUtils.java:237)
>        at org.apache.airavata.orchestrator.cpi.impl.SimpleOrchestratorImpl.createAndSaveTasks(SimpleOrchestratorImpl.java:315)
>        ... 10 common frames omitted
> Caused by: org.apache.airavata.orchestrator.core.exception.OrchestratorException: Compute
resource should have at least one job submission interface defined...
>        at org.apache.airavata.orchestrator.core.utils.OrchestratorUtils.getPreferredJobSubmissionInterface(OrchestratorUtils.java:233)
>        ... 11 common frames omitted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message