chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Graham (JIRA)" <>
Subject [jira] Updated: (CHUKWA-533) Improve fault-tolerance of collectors.
Date Mon, 22 Nov 2010 23:30:13 GMT


Bill Graham updated CHUKWA-533:

    Attachment: CHUKWA-533-2.patch

Thanks Eric.

Here's patch #2. It contains additional logic to handle when the previous output stream can't
be closed before the move during {{rotate}}. This is for the case where HDFS went down and
back up, so the file handle might not always be able to be closed per se, but the file could
still be moved. This patch is deployed on our system and seems to be working well.

> Improve fault-tolerance of collectors.
> --------------------------------------
>                 Key: CHUKWA-533
>                 URL:
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: CHUKWA-533-1.patch, CHUKWA-533-2.patch
> There are currently a number of ways that a collector can die, typically due to errors
on a DN or a NN that's being restarted. A collector should have some combination of retry
logic followed by failing back to the agent, but the collector process should not die.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message