nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Quick" <>
Subject crawling protected pages
Date Mon, 12 Sep 2005 17:45:16 GMT

I posted to the user list but didn't get a reply. I want to crawl a 
protected site, but there doesn't seem to be an option for that in Nutch at 
the moment.

However, it doesn't sound like something that would be too hard to add, 
assuming the java http client library can handle that. As I'm not familiar 
with the code, could someone point me at the file (or files) in the source 
which do the crawling please? I'm not professing to be a top Java programmer 
(perl's my speciality) but I'll give it a shot, unless anyone else wants 

Many thanks,


View raw message