directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moyer, Steven William" <sw...@psu.edu>
Subject LdapConnectionPool problems
Date Mon, 15 Jul 2019 17:08:51 GMT
We've got a few interesting problems occurring at a network level in our systems and I was
hoping for some pointers on how to troubleshoot them.  We're connecting to both OpenLDAP and
Microsoft AD using LDAP API client.  For testing, I've built a quick fixture using 2.0.0-AM4.

1)  We have a variety of devices (security and load-balancers) between our LDAP client and
the LDAP servers.  Using a connection pool with testWhileIdle = true and a timeBetweenEvictionRunsMillis
= 3700000 (over an hour) we're seeing the load-balancer disconnect unused connections (this
is the expected behavior at 1 hour).  When this happens, we'll typically see between one and
four of the eight connections receive a RST/ACK in Wireshark which seems to trigger the pool
to fill back up.  The odd thing is that regardless of how many RST/ACK we see, there are always
three new connections established and (presumably) added to the pool.  We've also got MinIdle
= 8, so you would think that this would result in the pool being larger or smaller depending
on the number of RST/ACKs we receive but the pool always reports 8 idle connections.  We're
seeing symptoms of there being broken connections in the pool and, as expected, if we set
testOnBorrow = true, this behavior disappears.  My concern is that this allows the LDAP operation
to proceed but our effective idle connections may be far lower than we expect.  Any idea how
we might best trouble-shoot the pool's behavior?  My guess is that since numTestPerEvictionRun
defaults to 3, the incoming RST/ACKs start one test cycle regardless of how many RST/ACKs
we receive.

We've got a couple processes that scale running threads up and down based on load and I'm
concerned that if the pool thinks it has idle connections, it's going to lend a broken connection
to a thread that spins up.

2)  We have an AD server that periodically locks up.  It's connections look fine but it will
never answer queries.  We're looking into why only one of the six servers has this pathology,
but I'm wondering whether using the LookupLdapConnectionValidator might help detect when this
problem occurs.  I know the validators are for detecting when a connections binding has changed
but performing the lookup has the side-effect (for us) of showing that the connection isn't
actually functional.  This problem is not frequent enough to really troubleshoot and has been
solved by rebooting the server so I don't think the long-term answer is a change to how we're
using the LDAP API.

3)  Our legacy load-balancer also drops connections that are unused, but doesn't send a signal
(RST/ACK or FIN/ACK) to the client at all.  Are we even going to be able to detect these?
 It seems like using the LookupLdapConnectionValidator might help with this as well.

One final observation - I'm trying to understand why the default lifo configuration is true.
 If a stack is effectively being used to manage connections, won't the connections "on the
bottom" generally be very stale?  If lifo = false results in the use of a fifo, won't that
tend to balance the use of the connections in the pool?

Hopefully this all makes sense ... I should note that in general the LDAP API is working well
against a very old version of AD, a new version of AD and several versions of OpenLDAP.  Right
now, we're working towards finding a connection and pool configuration that best handles the
active network devices used to provide resiliency and security to our organization.

Thanks for any insights you might provide!

Steve

Mime
View raw message