nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject Re: crawling protected pages
Date Mon, 12 Sep 2005 19:54:25 GMT
Hi Andrzej

There is HttpAuthenticationFactory class in protocol-httpclient
plugin. But I doubt that whether RFC 2617 basic authentication works.
I cannot see the reference to HttpAuthenticationFactory class. I
missed something?

Reagds
/Jack

On 9/13/05, Andrzej Bialecki <ab@getopt.org> wrote:
> Edward Quick wrote:
> > Hi,
> >
> > I posted to the user list but didn't get a reply. I want to crawl a
> > protected site, but there doesn't seem to be an option for that in Nutch
> > at the moment.
> >
> > However, it doesn't sound like something that would be too hard to add,
> > assuming the java http client library can handle that. As I'm not
> > familiar with the code, could someone point me at the file (or files) in
> > the source which do the crawling please? I'm not professing to be a top
> > Java programmer (perl's my speciality) but I'll give it a shot, unless
> > anyone else wants to?!
> 
> The quick hack would be to add necessary code somewhere in
> protocol-httpclient. Eventually though, I think Nutch should grow an
> authentication factory, which would supply needed credentials to other
> plugins.
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 


-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message