One suggestion (not really directly related to httpd config) is for you to capture the TLS handshakes over the wire with wireshark/tcpdump and confirm that the Client Certificate Request is actually being sent to the client systems in the various scenarios you've laid out. I've seen browser caching interfere with expected results before, perhaps when the browser is renegotiating as opposed to establishing a new TLS session.

Also, is the F5 simply redirecting users to the backends with HTTP 301/302 or is it acting as a SSL terminating reverse proxy? I assume not based on your description of your system setup, but it might be worth checking out.


On Mon, Dec 31, 2012 at 1:31 PM, Steven Siebert <smsiebe@gmail.com> wrote:
Hello,

I'm running httpd 2.2.3 on RHEL (2.6.18), trying to get client certificate authentication working on the last node (distributed setup using F5 GTM), but running into issues where the last httpd server starts to accept client certificates (browsers prompt for certificate) for about 10min, but then suddenly stops prompting.  It gets even weirder (detailed below). Every other httpd instance works great.

I have a server cert issues to by our CA, set it up on httpd A and exported it and running in on httpd B.  Traffic is redirected by the F5 GTM based on routing times.  This has worked great for several years.  Now we're adding client certificate authentication where the client certificates are issues by the same CA.  We want the authentication to be optional for now while we're transitioning, so if the client has not yet received their certificate they can still gracefully degrade back to the form-based authentication.  This works great on our dev server, our canary server, and on one of the production servers (httpd A).  When setting the same configuration on httpd B, though, something really strange starts to happen.  At first, clients were not bring prompted for their certificate choice in their browser from httpd B.  In troubleshooting, I changes the SSLVerifyClient option from 'optional' to 'require', in hopes to verify my client certificate chain file and the like.  Doing this causes httpd B to request the client certificate (as seen in the browser).  With this setting on 'require', the httpd B server continues to work as expected.  I then change this setting to 'optional', stop the httpd service (check and ensure all httpd processes have been killed) and restart the httpd service.  I check again with my browser - it works!  To summarize, the only thing I did was change the SSLVerifyClient option from 'optional' to 'require' and the back again - no other changes.  Confused, by triumphant, I re-enable the site on the F5.  Minutes later, users are no longer prompted for their CA certs on that node - they are going directly to the form.  I fail the node, and start troubleshooting again. 

After hours of reading logs (set SSL logging to 'info', nothing really significant here) and verifying SSL settings - I'm starting to pull my hair out and hoping to get some "fresh eyes" on the situation.  Reloading or restarting the httpd service does not fix it (I tried doing this a dozen or so times, really without any empirical test reasoning other than to prove that it wasn't fluke with the 'require'/'optional' flipping.  I then did the flip - 'optional' to 'require' and verified it worked again.  Then, flipped it back 'require' to 'optional'...and again, it still works for about 10 min.  The only thing I could think of is that httpd B is making AJAX calls to the DNS entry, which may be redirected back to httpd A, causing issues on the client somehow (we're LB at the OSI 3 level, so both servers have the same key pair).  But I switched between different clients (RHEL/Windows) just going to httpd b and no love.  Not until I restart the httpd service after changing it to 'require', then restart it again after changing it back to 'optional' does it work for a bit!

All servers are running the same RHEL and Apache versions (they are VM templates of each other).  httpd is configured identically, as confirmed by a 'diff'.  They are running the same certs - and httpd B works when on 'require'.

I was very happy with how easy this development solution and administrative transition was going - until the last server =). 

Other than Murphy's law, any ideas on what could be going on?

Thanks,

S