manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Timeout problems with web crawling
Date Tue, 23 Apr 2013 10:40:32 GMT
One thing that is really obvious from the production server log is the
disastrously bad Postgresql plan.  It's completely and tragically wrong.
I've created CONNECTORS-678 to track that issue.

Have you or the sys admins modified the schema/indexes in any way?

Karl


On Tue, Apr 23, 2013 at 6:22 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no>wrote:

>
> I'm still having problems with web crawling using trunk with updated Http
> client. It seems that the problems occur when Solr is password protected
> even though the error messages in my logs indicate a timeout problem. I'm
> not 100 % sure, but it seems that the problem starts as soon as I'm
> enabling password protection.
>
> We have struggled a lot with the web crawler in production mode recently,
> but I thought that we managed to get around these problems when "expect 100
> continue" was added to the header (now added in trunk). Then we discovered
> a Resin bug which sent a wrong http status code back when this header was
> enabled, but this has been solved by moving the authentication
> configuration to Apache HTTP server instead (using .htaccess). So
> everything *should* work, but it doesn't. Now I have managed to reproduce
> the problems on our test sever as well when I added full password
> protection for the Solr test server. As I wrote above, the logs does not
> seem to report problems with the Solr server, but the crawled resources
> instead.
>
> I have added two logs. One from the production server, and another from
> the test server. Log level is set to DEBUG for HttpClient. The prod job
> just stops and hangs, maybe due to a db lock. The test stops with the
> message "Error: Repeated service interruptions - failure processing
> document: null" ("read timed out" in simple history).
>
> The logs are available here:
> http://folk.uio.no/erlendfg/**manifoldcf/<http://folk.uio.no/erlendfg/manifoldcf/>
>
> Erlend
>
> --
> Erlend GarĂ¥sen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
> 31050
>

Mime
View raw message