nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dhoulker <davehoul...@gmail.com>
Subject Skipping certain URLs
Date Fri, 04 Feb 2011 15:02:03 GMT

Hi,

I'm trying to skip certain urls in an intranet site. 

I'd like to skip: (this is actually default.aspx we have the default
document set up)

http://10.47.23.110:85/firm-info/bios/

However when i try and block that page it also blocks the entire section of
the site.

So URLs like also get blocked:

http://10.47.23.110:85/firm-info/bios/2904/some-page.aspx

My regex skills aren't great so i suspect its just that.

I've tried the below, but to no avail

-http://10.47.23.110:85/firm-info/bios/
-http://10.47.23.110:85/firm-info/bios/[^0-9]

Can anyone help please!

Thanks

Dave
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Skipping-certain-URLs-tp2424735p2424735.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Mime
View raw message