airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Christie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-2352) Orchestrator sometimes stops processing messages from experiment_launch queue
Date Wed, 29 Mar 2017 13:51:41 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947155#comment-15947155
] 

Marcus Christie commented on AIRAVATA-2352:
-------------------------------------------

As some initial steps, I think we could do the following
* [add a shutdown listener that logs the details of what caused the shutdown|https://www.rabbitmq.com/api-guide.html#shutdown]
** we actual [add a shutdown listener|https://github.com/apache/airavata/blob/3f29cfdbd71de18777557713dce58007a3cbc2f5/modules/messaging/core/src/main/java/org/apache/airavata/messaging/core/impl/RabbitMQSubscriber.java#L152-L152]
but it doesn't do anything currently
* [add an exception handler that logs the exception|https://www.rabbitmq.com/api-guide.html#unhandled-exceptions]
** by default RabbitMQ has an exception handler but it only logs the exception to standard
out
*** on that note, it might be a good idea to redirect standard out to a log file, current
airavata-server-start.sh redirects stdout to /dev/null

I think with these in place, the next time this happens hopefully we get some more information
on what is causing the problem.

It might also be worth [adding a recovery listener|https://www.rabbitmq.com/api-guide.html#recovery].
We have auto recovery enabled, but perhaps it is failing.

> Orchestrator sometimes stops processing messages from experiment_launch queue
> -----------------------------------------------------------------------------
>
>                 Key: AIRAVATA-2352
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2352
>             Project: Airavata
>          Issue Type: Bug
>    Affects Versions: 0.17
>            Reporter: Marcus Christie
>            Priority: Critical
>
> This was observed on 3/29 9am for gw153.iu.xsede.org.
> *Workaround*: restart the api-orch server. Once that is done the orchestrator starts
picking up events from experiment_launch again.
> I'm not seeing anything in the logs to indicate why the orchestrator stopped processing
experiment_launch events.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message