nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: HTTP Post Authentication
Date Tue, 07 Apr 2015 14:27:15 GMT
Thanks Tizy - adding Tyler to this in case he didn’t see it.
Tyler is this what you were running into? Thoughts?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Tizy Ninan <tizy1307@gmail.com>
Reply-To: "dev@nutch.apache.org" <dev@nutch.apache.org>
Date: Tuesday, April 7, 2015 at 5:11 AM
To: "user@nutch.apache.org" <user@nutch.apache.org>
Cc: "dev@nutch.apache.org" <dev@nutch.apache.org>
Subject: Re: HTTP Post Authentication

>Hi, 
>
>
>I am still not able to crawl websites requiring authentication.
>The version of Nutch used is 1.10.
>
>
>While crawling I am getting the following warnings and still not able to
>identify what is going wrong.
>Please find the httpclient-auth.xml file in the following link.
> https://gist.github.com/tizyninan/4412936795b02bbe9cee
>
>
>
>INFO conf.Configuration: found resource httpclient-auth.xml at
>jar:file:../target/NutchCrawler-1.0-SNAPSHOT.jar!/httpclient-auth.xml
>INFO fetcher.Fetcher: -activeThreads=1, spinWaiting=0,
>fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
>WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not
>recognized in httpclient-auth.xml - expected <authscope>
>WARN httpclient.Http: Bad auth conf file: Element <additionalPostHeaders>
>not recognized in httpclient-auth.xml - expected <authscope>
>WARN httpclient.Http: Bad auth conf file: Element <removedFormFields> not
>recognized in httpclient-auth.xml - expected <authscope>
>
>
>
>Looking forward for help.
>
>
>Thanks,
>Tizy
>
>
>
>
>On Thu, Mar 19, 2015 at 6:47 AM, Mohammed Omer
><beancinematics@gmail.com> wrote:
>
>Edit: The first link should be
>https://www.mikeash.com/getting_answers.html
><https://www.mikeash.com/getting_answers.html>
>
>Thank you,
>
>Mo
>
>On Wed, Mar 18, 2015 at 8:16 PM, Mohammed Omer <beancinematics@gmail.com>
>wrote:
>
>> Tizy, in order to help debug your error, you'll need to provide
>>additional
>> information. Check out this link for what's generally needed when
>>trying to
>> debug over chat/email:
>http://www.mikeash.com/getting_answers
><http://www.mikeash.com/getting_answers>
>>
>> The error seems to say that httpclient.Http doesn't like the auth conf
>> file you provided. Can you post it and any other relevant changes you've
>> made to a http://gist.github.com file, and post it here?
>>
>> Thank you,
>>
>> Mo
>>
>> On Fri, Mar 13, 2015 at 12:43 AM, Tizy Ninan <tizy1307@gmail.com> wrote:
>>
>>> Hi Lewis,
>>>
>>> Thank you for the reply.
>>>
>>> I tried by providing the parameters specified in the
>>>httpclient-auth.xml
>>> template file. But while crawling I am getting the following warnings.
>>>
>>> WARN httpclient.Http: Bad auth conf file: root element <credentials>
>>>found
>>> in httpclient-auth.xml - must be <auth-configuration>
>>> WARN httpclient.Http: Bad auth conf file: Element <loginPostData> not
>>> recognized in httpclient-auth.xml - expected <credentials>
>>> WARN httpclient.Http: Bad auth conf file: Element
>>><additionalPostHeaders>
>>> not recognized in httpclient-auth.xml - expected <credentials>
>>>
>>> The httpclient-auth.xml file is placed in the conf folder. The version
>>>of
>>> nutch used is nutch 1.10 (trunk).
>>>
>>> Could you please explain what could be wrong?
>>>
>>> Thanks,
>>> Tizy
>>>
>>>
>>> On Fri, Mar 13, 2015 at 1:26 AM, Lewis John Mcgibbney <
>>> lewis.mcgibbney@gmail.com> wrote:
>>>
>>> > Hi Tizy,
>>> >
>>> > On Thu, Mar 12, 2015 at 12:20 AM, <user-digest-help@nutch.apache.org>
>>> > wrote:
>>> >
>>> > >
>>> > > Is there any detailed step by step explanation on how to implement
>>> > > HTTPPostAuthentication on Nutch 1.10.?
>>> > >
>>> > >
>>> >
>>> >
>>> 
>https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.templa
>te#L61-L105 
><https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.templ
>ate#L61-L105>
>>> > 
>https://wiki.apache.org/nutch/HttpPostAuthentication
><https://wiki.apache.org/nutch/HttpPostAuthentication>
>>> > HTH
>>> > Lewis
>>> >
>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Tizy
>>>
>>
>>
>
>
>
>
>
>
>
>
>
>-- 
>Thanks and Regards,
>Tizy
>
>
>
>
>
>
>

Mime
View raw message