airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimuthu Upeksha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRAVATA-2742) Helix Controller throws an Exception when the participant is killed
Date Wed, 11 Apr 2018 16:05:00 GMT

    [ https://issues.apache.org/jira/browse/AIRAVATA-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434115#comment-16434115
] 

Dimuthu Upeksha commented on AIRAVATA-2742:
-------------------------------------------

Helix Team identified this as an bug and they will fix it in future releases

https://issues.apache.org/jira/browse/HELIX-693

Helix Dev discussion - Subject: Sporadic issue when restarting a Participant

> Helix Controller throws an Exception when the participant is killed
> -------------------------------------------------------------------
>
>                 Key: AIRAVATA-2742
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2742
>             Project: Airavata
>          Issue Type: Bug
>          Components: helix implementation
>    Affects Versions: 0.18
>            Reporter: Dimuthu Upeksha
>            Assignee: Dimuthu Upeksha
>            Priority: Major
>
> This was a sporadic issue and occurred only once in the test setup. There were 5 - 10
tasks running in the Participant and Participant was externally killed by SIGTERM command
(kill <process-id>. Once the Participant is started again, it did not pickup the tasks
that it was running at the time it was killed. Surprisingly, the status of the respective
workflows were IN_PROGRESS status. Helix Controller log showed following error for each Workflow.
This seems like a bug in Helix and I posted the issue in Helix mailing list (Subject : Sporadic
issue when restarting a Participant). 
>  
> 2018-04-06 15:10:57,766 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error
computing assignment for resource Workflow_of_process_PROCESS_7f6c8a54-b50f-4bdb-aafd-59ce87276527-POST-b5e39e07-2d8e-4309-be5a-f5b6067f9a24_TASK_cc8039e5-f054-4dea-8c7f-07c98077b117.
Skipping.
> java.lang.NullPointerException: Name is null
>         at java.lang.Enum.valueOf(Enum.java:236)
>         at org.apache.helix.task.TaskPartitionState.valueOf(TaskPartitionState.java:25)
>         at org.apache.helix.task.JobRebalancer.computeResourceMapping(JobRebalancer.java:272)
>         at org.apache.helix.task.JobRebalancer.computeBestPossiblePartitionState(JobRebalancer.java:140)
>         at org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:171)
>         at org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:66)
>         at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:48)
>         at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:295)
>         at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:595)
> 2018-04-06 15:11:00,385 [Thread-3] ERROR o.a.h.c.s.BestPossibleStateCalcStage  - Error
computing assignment for resource Workflow_of_process_PROCESS_2b69b499-c527-4c9d-8b2b-db17366f5f81-POST-c67607ae-9177-4a02-af8a-8b3751eea4ff_TASK_1ea6876d-f2ec-4139-a15d-0e64a80a3025.
Skipping. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message