manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Riethmuller <priethmul...@funnelback.com>
Subject Re: HTTP 302 error causing job to abort
Date Tue, 16 Feb 2016 22:22:15 GMT
Thanks Karl,

The majority of content is not going to the redirect, it¹s probably just a
handful of documents that are behaving this way.

I¹d agree that it¹s of lesser concern whether or not the document itself is
indexing, however I wouldn¹t expect the 302 to be treated as a fatal error
that causes the job to come to a halt. I¹d expect the document to be passed
over, and the crawl to continue.

Is the only solution at this point to remove the documents which redirect to
a 302 to get the crawl to run in full?

Regards,

Phil Riethmuller
Technical Consultant
 
Funnelback | 437 Kent Street, Sydney, NSW 2000
T +61 2 9045 2882 | funnelback.com <http://www.funnelback.com/>

AUSTRALIA | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES


Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback>  -
Twitter


From:  Karl Wright <daddywri@gmail.com>
Reply-To:  <user@manifoldcf.apache.org>
Date:  Wednesday, 17 February 2016 8:58 am
To:  "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
Subject:  Re: HTTP 302 error causing job to abort

Hi Phil,

You probably want to point your SharePoint repository connection to the
proper server and site, and not rely on redirections.  It's also possible
that you are missing the site entirely and the redirection you are seeing is
taking you to some error page somewhere.

I will be raising the question of redirections with the
HttpComponents/HttpClient team, since I see no obvious problems with the
SharePoint connector code.  However, if your connection is properly set up,
redirections should be unneeded.

I would read the documentation on the Wiki page for debugging SharePoint
connections at the bottom of this page:
https://cwiki.apache.org/confluence/display/CONNECTORS/Debugging+Connections

Thanks,
Karl


On Tue, Feb 16, 2016 at 4:55 PM, Phil Riethmuller
<priethmuller@funnelback.com> wrote:
> Do you mean in the job status in the Manifold CF interface?
> 
> The job status also shows the same:
> Error: Unexpected http error code 302 accessing SharePoint at <url>:
> (302)HTTP/1.0 302 Found
> 
> I agree, I wouldn¹t of thought that the crawler would follow any links or
> redirections.
> 
> What sort of configurations could be incorrectly configured, that I could look
> at revising?
> 
> Phil
> 
> 
> From:  Karl Wright <daddywri@gmail.com>
> Reply-To:  <user@manifoldcf.apache.org>
> Date:  Wednesday, 17 February 2016 8:45 am
> 
> To:  "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
> Subject:  Re: HTTP 302 error causing job to abort
> 
> Thanks.
> 
> When you view the repository connection in the UI, do you get a 302 error
> also?
> 
> I have looked at the code; Httpclient is supposedly configured to honor
> redirections.  Obviously it is not doing that, so I'll have to dig deeper into
> why that is.  On the other hand, I would not expect you to be getting any
> redirections, unless you have configured your connection incorrectly.
> 
> Karl
> 
> 
> On Tue, Feb 16, 2016 at 4:31 PM, Phil Riethmuller
> <priethmuller@funnelback.com> wrote:
>> Thanks Karl -
>> 
>> I¹ve replaced the actual URL with <URL> below, but here is the stack trace:
>> 
>> ERROR 2016-02-16 12:10:55,251 (Worker thread '16') - Exception tossed:
>> Unexpected http error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0
>> 302 Found
>> 
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected http
>> error code 302 accessing SharePoint at <URL>: (302)HTTP/1.0 302 Found
>> 
>>         at 
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(S
>> PSProxyHelper.java:2246)
>> 
>>         at 
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SharePointRepository.proc
>> essDocuments(SharePointRepository.java:1549)
>> 
>>         at 
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> 
>> Caused by: (302)HTTP/1.0 302 Found
>> 
>>         at 
>> org.apache.manifoldcf.connectorcommon.common.CommonsHTTPSender.invoke(Commons
>> HTTPSender.java:201)
>> 
>>         at 
>> org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:3
>> 2)
>> 
>>         at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
>> 
>>         at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
>> 
>>         at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
>> 
>>         at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
>> 
>>         at org.apache.axis.client.Call.invoke(Call.java:2767)
>> 
>>         at org.apache.axis.client.Call.invoke(Call.java:2443)
>> 
>>         at org.apache.axis.client.Call.invoke(Call.java:2366)
>> 
>>         at org.apache.axis.client.Call.invoke(Call.java:1812)
>> 
>>         at 
>> com.microsoft.schemas.sharepoint.soap.WebsSoapStub.getWebCollection(WebsSoapS
>> tub.java:854)
>> 
>>         at 
>> org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getSites(S
>> PSProxyHelper.java:2161)
>> 
>> 
>> 
>> 
>> Regards,
>> 
>> Phil Riethmuller
>> Technical Consultant
>>  
>> Funnelback | 437 Kent Street, Sydney, NSW 2000
>> T +61 2 9045 2882 <tel:%2B61%202%209045%202882>  | funnelback.com
>> <http://www.funnelback.com/>
>> 
>> AUSTRALIA | UNITED KINGDOM | NEW ZEALAND | POLAND | UNITED STATES
>> 
>> 
>> Connect with us: LinkedIn <http://www.linkedin.com/company/funnelback>  -
>> Twitter
>> 
>> 
>> From:  Karl Wright <daddywri@gmail.com>
>> Reply-To:  <user@manifoldcf.apache.org>
>> Date:  Tuesday, 16 February 2016 6:54 pm
>> To:  "user@manifoldcf.apache.org" <user@manifoldcf.apache.org>
>> Subject:  Re: HTTP 302 error causing job to abort
>> 
>> Hi Phil,
>> 
>> A HTTP 302 response is simply a redirection.  It should not, by itself, cause
>> a job to abort.  I would expect that to go by in wire/http logging, but you
>> should not see it anywhere else.  So it is not clear to me what you are
>> really seeing here.
>> 
>> Can you include an example stack trace from the manifoldcf log?
>> 
>> Karl
>>  
>> 
>> On Tue, Feb 16, 2016 at 12:22 AM, Phil Riethmuller
>> <priethmuller@funnelback.com> wrote:
>>> Hi -
>>> 
>>> When crawling a Sharepoint repository, I¹m receiving a HTTP 302 error which
>>> is causing the manifold job to abort. How do I prevent the crawler from
>>> aborting the job?
>>> 
>>> I¹m using v2.3 of Manifold with a postgres database.
>>> 
>>> Regards,
>>> Phil
>> 
> 




Mime
View raw message