On Sat, Oct 22, 2011 at 12:52:05PM +0200, MegaBrutal wrote:
> What I tried:
>
> # Include the virtual host configurations:
> Include sites-enabled/
>
> # Deny all unknown virtual host names
>
> ServerName *
> DocumentRoot /var/www
>
> Order allow,deny
> # Allow from googlebot.com
> Allow from 127.0.0.1
>
> SetEnvIf Remote_Addr "127\.0\.0\.1" localhostlog
> CustomLog "/var/log/apache2/access.log" combined env=localhostlog
> CustomLog "/var/log/apache2/reject.log" vhost_combined env=!localhostlog
> ErrorLog "/var/log/apache2/reject_error.log"
>
>
[snip]
> If no matching virtual host is found, then *the first listed virtual
> host*that matches the IP address will be used.
> > (Apache website on Name-based Virtual Host Support
> > )
> >
>
> OK, make it the first virtual host config! Naive! If I put my all-rejecting
> virtual host before the include for my specific virtual hosts, then all
> request will be served by the rejecting virtual host - even request for my
> legit virtual host names. But it is also the expected behaviour:
>
> Now when a request arrives, the server will first check if it is using an IP
> > address that matches the NameVirtualHost.
> > If it is, then it will look at each section with a matching IP address and try to find one where the
> > ServerName or
> > ServerAlias matches the requested hostname. If it finds one, then it uses
> > the configuration for that server.
> >
>
> Apache tries to find a suitable virtual host config by looking from up to
> down. Of course, "*" matches everything, so the all-rejecting virtual host
> config will catch all requests, the other virtual hosts won't be checked
> ever.
All of this is correct as stated. So, if I've understood your problem
correctly, the solution is to put your bot-catcher first in the config,
but remove the wildcard in the ServerName. eg:
# Deny all unknown virtual host names
ServerName my-fantastic-bot-catcher
... Directives to handle bots here ...
# Include the virtual host configurations:
Include sites-enabled/
So, if a request comes in that does match one of your real ServerNames,
the catch-all will be bypassed. But, if a request comes in that doesn't
match a real ServerName, the botcatcher will serve it.
This is the approach which I generally use, although I tend not to use
an asterisk in the VirtualHost directive - so that part is untested.
HTH,
Pete
--
Openstrike - improving business through open source
http://www.openstrike.co.uk/ or call 01722 770036 / 07092 020107