chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <>
Subject [jira] Commented: (CHUKWA-369) proposed reliability mechanism
Date Wed, 02 Sep 2009 21:27:32 GMT


Ari Rabkin commented on CHUKWA-369:

- You do not need to get an acknowledgment from the same collector you sent to.  The "ack"
is really just a confirmation that the file in question rotated OK, and was a sufficient length
when it rotated.

- Collectors don't need to do anything special on rotation

- There's no long-running TCP connection between agent and collector.  But my current implementation
does assume that an agent will continue to use a single collector until it gets an IOException.
  For now, I'm not using timeouts; instead, it relies on getting an IOException from a down
collector.  This is simpler, but would require modification if we started doing dynamic load-balancing
across collectors.

> proposed reliability mechanism
> ------------------------------
>                 Key: CHUKWA-369
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>         Attachments: delayedAcks.patch
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message