chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ari Rabkin (JIRA)" <>
Subject [jira] Updated: (CHUKWA-369) proposed reliability mechanism
Date Fri, 04 Sep 2009 19:28:57 GMT


Ari Rabkin updated CHUKWA-369:

    Status: Patch Available  (was: In Progress)

I've now tested this fairly extensively, at data rates up to 200 MB/sec, up to 256 agents
and 20 collectors.  It's looking very good and I want to commit it.

- Substantial tests are included.
- The asynch ack mechanism is controlled by a conf option, and defaults to off. So if you're
hesitant about it, you don't need to use it and everything should remain the way it was.
- Even if it's turned on, collectors can still respond with an immediate Ack, if they happen
to write synchronously. (E.g., if the collector is writing to HBase or local filesystem)
- I tried pretty hard to code this in such a way that we can easily evolve and adapt the code
to support other reliability strategies in the future.

> proposed reliability mechanism
> ------------------------------
>                 Key: CHUKWA-369
>                 URL:
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>         Attachments: CHUKWA-369.patch, delayedAcks.patch
> We like to say that Chukwa is a system for reliable log collection. It isn't, quite,
since we don't handle collector crashes.  Here's a proposed reliability mechanism.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message