airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimuthu Upeksha (JIRA)" <>
Subject [jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed
Date Mon, 09 Apr 2018 14:53:00 GMT


Dimuthu Upeksha commented on AIRAVATA-2742:

Tested this locally for both SIGKILL and SIGTERM commands but couldn't reproduce it. As a
safety step, I'm updating Helix core version form 0.6.7 -> 0.8.0. But I would suggest to extensively
inspect participant restarts and the consistency of workflow executions in future testing
iterations. Specially, observe the Helix Controller log

> Helix Controller throws an Exception when the participant is killed
> -------------------------------------------------------------------
>                 Key: AIRAVATA-2742
>                 URL:
>             Project: Airavata
>          Issue Type: Bug
>          Components: helix implementation
>    Affects Versions: 0.18
>            Reporter: Dimuthu Upeksha
>            Assignee: Dimuthu Upeksha
>            Priority: Major
> This was a sporadic issue and occurred only once in the test setup. There were 5 - 10
tasks running in the Participant and Participant was externally killed by SIGTERM command
(kill <process-id>. Once the Participant is started again, it did not pickup the tasks
that it was running at the time it was killed. Surprisingly, the status of the respective
workflows were IN_PROGRESS status. Helix Controller log showed following error for each Workflow.
This seems like a bug in Helix and I posted the issue in Helix mailing list (Subject : Sporadic
issue when restarting a Participant). 
> 2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error
computing assignment for resource Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
> java.lang.NullPointerException: Name is null
>         at java.lang.Enum.valueOf(
>         at org.apache.helix.task.TaskPartitionState.valueOf(
>         at org.apache.helix.task.JobRebalancer.computeResourceMapping(
>         at org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(
>         at org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(
>         at org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(
>         at org.apache.helix.controller.pipeline.Pipeline.handle(
>         at org.apache.helix.controller.GenericHelixController.handleEvent(
>         at org.apache.helix.controller.GenericHelixController$
> 2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error
computing assignment for resource Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.

This message was sent by Atlassian JIRA

View raw message