cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang Yang (JIRA)" <>
Subject [jira] Created: (CASSANDRA-2157) Hector concurrentHClient pool gives out more connections than its quota
Date Sat, 12 Feb 2011 06:11:57 GMT
Hector concurrentHClient pool gives out more connections than its quota

                 Key: CASSANDRA-2157
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Yang Yang

Hector can give up on connection pool grabbing, in line 85 (following
all refer to latest 0.7.0 head)

     } else {

        try {
          cassandraClient = availableClientQueue.poll(maxWaitTimeWhenExhausted, TimeUnit.MILLISECONDS);
          if ( cassandraClient == null ) {
            throw new PoolExhaustedException(String.format("maxWaitTimeWhenExhausted exceeded
for thread %s on host %s",
                new Object[]{
        } catch (InterruptedException ie) {

so if we specify a maxwaittime, it could give up and **** do a numActive.decrementAndGet().

but in the

  public void operateWithFailover(Operation<?> op) throws HectorException {

in the main loop of this method,  

        client =  getClientFromLBPolicy(excludeHosts);
could throw Exception.
  in the catch part,  there is a clause for 

        } else if ( he instanceof PoolExhaustedException ) {
          retryable = true;
          if ( hostPools.size() == 1 ) {
            throw he;

I guess this is written for the timeout scenario above, so it's supposed to catch that.
but getClientFromLBPolicy() reconstructs a general HectorException from the PoolExhaustedException
given by borrowClient().
this makes all pool grabbing timeout immediately pop up to client, which I guess is not the
original intention.

so I guess getClientFromLBPolicy() needs to throw directly the original Exception. so as to
trigger the logic in the catch part.

but after I made those changes, I found that I often get ActiveNum() from the pool to be negative,
and TillExhausted to be higher than the quota. this does not make sense.
this was because that every code path goes through the line "releaseClient()" in the  finally
{} clause. so that on the pool grabbing , numActive.decrementAndGet() was already executed,
and it also gets executed in the finally clause

this end up creating many connections to the server, which bogs down the server , we have
seen it creating huge cpu load

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message