nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tizy Ninan <tizy1...@gmail.com>
Subject Re: HTTP Post Authentication
Date Tue, 07 Apr 2015 12:11:36 GMT
Hi,

I am still not able to crawl websites requiring authentication.
The version of Nutch used is 1.10.

While crawling I am getting the following warnings and still not able to
identify what is going wrong.
Please find the httpclient-auth.xml file in the following link.
 https://gist.github.com/tizyninan/4412936795b02bbe9cee

INFO conf.Configuration: found resource httpclient-auth.xml at
jar:file:../target/NutchCrawler-1.0-SNAPSHOT.jar!/httpclient-auth.xml
INFO fetcher.Fetcher: -activeThreads=1, spinWaiting=0,
fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not
recognized in httpclient-auth.xml - expected <authscope>
WARN httpclient.Http: Bad auth conf file: Element <additionalPostHeaders>
not recognized in httpclient-auth.xml - expected <authscope>
WARN httpclient.Http: Bad auth conf file: Element <removedFormFields> not
recognized in httpclient-auth.xml - expected <authscope>

Looking forward for help.

Thanks,
Tizy


On Thu, Mar 19, 2015 at 6:47 AM, Mohammed Omer <beancinematics@gmail.com>
wrote:

> Edit: The first link should be
> https://www.mikeash.com/getting_answers.html
>
> Thank you,
>
> Mo
>
> On Wed, Mar 18, 2015 at 8:16 PM, Mohammed Omer <beancinematics@gmail.com>
> wrote:
>
> > Tizy, in order to help debug your error, you'll need to provide
> additional
> > information. Check out this link for what's generally needed when trying
> to
> > debug over chat/email: http://www.mikeash.com/getting_answers
> >
> > The error seems to say that httpclient.Http doesn't like the auth conf
> > file you provided. Can you post it and any other relevant changes you've
> > made to a http://gist.github.com file, and post it here?
> >
> > Thank you,
> >
> > Mo
> >
> > On Fri, Mar 13, 2015 at 12:43 AM, Tizy Ninan <tizy1307@gmail.com> wrote:
> >
> >> Hi Lewis,
> >>
> >> Thank you for the reply.
> >>
> >> I tried by providing the parameters specified in the httpclient-auth.xml
> >> template file. But while crawling I am getting the following warnings.
> >>
> >> WARN httpclient.Http: Bad auth conf file: root element <credentials>
> found
> >> in httpclient-auth.xml - must be <auth-configuration>
> >> WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not
> >> recognized in httpclient-auth.xml - expected <credentials>
> >> WARN httpclient.Http: Bad auth conf file: Element
> <additionalPostHeaders>
> >> not recognized in httpclient-auth.xml - expected <credentials>
> >>
> >> The httpclient-auth.xml file is placed in the conf folder. The version
> of
> >> nutch used is nutch 1.10 (trunk).
> >>
> >> Could you please explain what could be wrong?
> >>
> >> Thanks,
> >> Tizy
> >>
> >>
> >> On Fri, Mar 13, 2015 at 1:26 AM, Lewis John Mcgibbney <
> >> lewis.mcgibbney@gmail.com> wrote:
> >>
> >> > Hi Tizy,
> >> >
> >> > On Thu, Mar 12, 2015 at 12:20 AM, <user-digest-help@nutch.apache.org>
> >> > wrote:
> >> >
> >> > >
> >> > > Is there any detailed step by step explanation on how to implement
> >> > > HTTPPostAuthentication on Nutch 1.10.?
> >> > >
> >> > >
> >> >
> >> >
> >>
> https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.template#L61-L105
> >> > https://wiki.apache.org/nutch/HttpPostAuthentication
> >> > HTH
> >> > Lewis
> >> >
> >>
> >>
> >>
> >> --
> >> Thanks and Regards,
> >> Tizy
> >>
> >
> >
>



-- 
Thanks and Regards,
Tizy

Mime
View raw message