nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From crawl party <crawlpa...@gmail.com>
Subject Re: Unable to fetch content after integrating selenium
Date Sun, 04 Oct 2015 00:29:33 GMT
I think it's because the first time Nutch calls your custom handler and the
second time it calls the default handler which doesn't do the login stuff.

On Sat, Oct 3, 2015 at 11:25 AM, Charan Shampur <charanshampur@gmail.com>
wrote:

> Hello developers,
>
> I extended the interactive selenium interface to write a custom handler,
> which automatically fills the basic login information and enters the page.
> This provides access for nutch to crawl the members area. After starting
> the crawl i could see the web browser getting launched and filling the
> login page, after which the control goes back to nutch(As
> expected),However  To my surprise the firefox driver is called again and
> the same page is loaded but this time it does not log in instead it fails
> with http code : 403. I have just one URL in the seed list.
>
> I am unable to figure out as to what is going wrong, any guidelines will
> be of great help to us.
>
> Thanks
> Charan
>
>
>
>
>
>
>

Mime
View raw message