Thanks for your insight - it appears to have thrust us over the latest barrier!
After lowering httpConnector.minPostInterval to 500 we have achieved much higher sustained throughput rates - data is flowing to the agents, through the collectors and landing in HDFS within seconds - observed 3.2GB collected in peek 5 minute period today (well beyond our prior 2GB wall!).
I can confirm defaults are 5000ms for httpConnector.minPostInterval and 2MB for httpConnector.maxPostSize - it's good to know that I should be able to decrease minPostInterval even more should the need arise.

Once again, many thanks.

On 13 August 2010 11:26, Ariel Rabkin <> wrote:
There are two knobs that, together, throttle the agent processes.

These are httpConnector.maxPostSize and httpConnector.minPostInterval

The maximum configured agent bandwidth is the ratio between those.  I
would try reducing the min post interval.  The defaults are, if I
remember right, something like 2 MB/ 5 seconds = 400 k/sec.   You can
crank that down a long ways.  Nothing should explode even if you set
it to 1 ms.


On Fri, Aug 13, 2010 at 9:11 AM, Eric Fiala <> wrote:
> Hello all,
> We would like to bring our production Chukwa (0.3.0) infrastructure to the
> next level.
> Currently, we have 5 machines generating 400GB per day (80GB in single log,
> per machine).
> These are using chukwa-agent CharFileTailingAdaptorUTF8.  Of
> note, chukwaAgent.fileTailingAdaptor.maxReadSize has been upped to 4000000.
>  We've left httpConnector.maxPostSize to default.
> The agents are sending to 3 chukwa-collectors which are simply gateways into
> HDFS (one also handles demux/processing - but this doesn't appear to be the
> wall... yet).  The agents have all three collectors listed in their conf.
> We are hitting walls somewhere, the whole 400GB is worked all the way into
> our repos over the course of the day, but during peeks we are falling
> upwards of 1-2 hours behind between being written to the tailed log and
> hitting hdfs://chukwa/logs as a .chukwa.
> Further we have observed that hdfs://chukwa/logs in our setup does not fill
> faster than 2GB per 5 minute period.  This is whether we use 2 chukwa
> collectors or 3.  This is further discouragement once foreseeable growth
> takes us to over ~ 575GB per day.
> All the machines are definitely not load bound, have noticed that chukwa was
> built with low resource utilization in mind - one thought is if this could
> be tweaked we could probably get more data through quicker.
> We have toyed with changing default Xmx or like value but don't want to
> start turning too many knobs before consulting the experts, considering all
> the pieces involved it's probably wise.  Scaling out is also an option, but
> I'm determined to squeeze x10 or more than current out of these multicore
> machines.
> Any suggestions are welcome,
> Thanks.
> EF

Ari Rabkin
UC Berkeley Computer Science Department