helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-778) TASK: Fix a race condition in updatePreviousAssignedTasksStatus
Date Thu, 01 Nov 2018 00:36:00 GMT

    [ https://issues.apache.org/jira/browse/HELIX-778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670955#comment-16670955

Hudson commented on HELIX-778:

FAILURE: Integrated in Jenkins build helix #1561 (See [https://builds.apache.org/job/helix/1561/])
[HELIX-778] TASK: Fix a race condition in (hulee: rev ceba1a55ae351090144c001324f908f2364212a4)
* (edit) helix-core/src/test/java/org/apache/helix/integration/task/TestUnregisteredCommand.java
* (edit) helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java

> TASK: Fix a race condition in updatePreviousAssignedTasksStatus
> ---------------------------------------------------------------
>                 Key: HELIX-778
>                 URL: https://issues.apache.org/jira/browse/HELIX-778
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: Hunter L
>            Assignee: Hunter L
>            Priority: Major
> It was observed that TestUnregisteredCommand is very unstable. The reason was identified
to be a race condition where when a task fails, sometimes a pending message for that task
(from INIT to RUNNING) wasn't being cleaned up on time, so AbstractTaskDispatcher's updatePreviousAssignedTasksStatus
would try to process that message and skip the status update of that task (like updating its
status and NUM_ATTEMPTS field in JobContext).
> A short, temporary fix is to call markPartitionError() prior to checking the pending
message, but over the long haul, we would need to revisit the task status update's design
here to avoid this type of race conditions.
> Changelist:
> 1. Move markPartitionError() up before checking for a pending message on the task
> 2. Fix TestUnregisteredCommand's instability

This message was sent by Atlassian JIRA

View raw message