httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boyle Owen" <Owen.Bo...@swx.com>
Subject RE: [users@httpd] Block wget attempts from my site
Date Tue, 03 Oct 2006 14:43:36 GMT
> -----Original Message-----
> From: Norman Khine [mailto:norman@khine.net] 
> Sent: Tuesday, October 03, 2006 3:17 PM
> To: users@httpd.apache.org
> Subject: [users@httpd] Block wget attempts from my site
> 
> Hello,
> 
> What is the best way to block someone from ripping/mirroring 
> stuff from my site
> with wget? Is there an Apache way to do this, have seen it done with
> .htaccess but perhaps there is a way to do this from Apache.
> 
> mod-security, snort perhaps? How does this fit with 
> VirtualHosts and can these be specific per host?
> 
> Any comments and advise much appreciated.

As Nick points out, it would be nice if people didn't need these things,
but sometimes you get some idiot who downloads a 10MB page of reference
data every minute in order to screen-scrape one number that he thinks
might change some time in the future. So you need to protect yourself...


Start with the User-agent header (see
http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html#browsermatch
etc.)

eg,

BrowserMatchNoCase ^wget restrictRobot
Deny from env=restrictRobot

(You can do pretty much the same thing in mod_rewrite)

Of course, this can be easily spoofed so then you're in to trapping
client IPs and blocking based on that. But then their on dial-up or ADSL
and keep changing the IP, so you need to use heuristics...

A good trap is a hidden URL (nothing visible to click on, but the href
is in the HTML) that only a robot sees and hits. It calls a server-sided
program that writes the client-IP to a file. Then, for each request, you
check this file (RewriteCond and RewriteMap) and drop the request if
from bad IP (RewriteRule ^/(.*) - [F]).

This can become quite a sport...

Rgds,
Owen Boyle
Disclaimer: Any disclaimer attached to this message may be ignored. 
> 
> Cheers
> 
> Norma
> 
> 
> 
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP 
> Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
 
 
This message is for the named person's use only. It may contain confidential, proprietary
or legally privileged information. No confidentiality or privilege is waived or lost by any
mistransmission. If you receive this message in error, please notify the sender urgently and
then immediately delete the message and any copies of it from your system. Please also immediately
destroy any hardcopies of the message. You must not, directly or indirectly, use, disclose,
distribute, print, or copy any part of this message if you are not the intended recipient.
The sender's company reserves the right to monitor all e-mail communications through their
networks. Any views expressed in this message are those of the individual sender, except where
the message states otherwise and the sender is authorised to state them to be the views of
the sender's company.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message