manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Timeout problems with web crawling
Date Tue, 23 Apr 2013 10:40:32 GMT
One thing that is really obvious from the production server log is the
disastrously bad Postgresql plan.  It's completely and tragically wrong.
I've created CONNECTORS-678 to track that issue.

Have you or the sys admins modified the schema/indexes in any way?


On Tue, Apr 23, 2013 at 6:22 AM, Erlend GarĂ¥sen <>wrote:

> I'm still having problems with web crawling using trunk with updated Http
> client. It seems that the problems occur when Solr is password protected
> even though the error messages in my logs indicate a timeout problem. I'm
> not 100 % sure, but it seems that the problem starts as soon as I'm
> enabling password protection.
> We have struggled a lot with the web crawler in production mode recently,
> but I thought that we managed to get around these problems when "expect 100
> continue" was added to the header (now added in trunk). Then we discovered
> a Resin bug which sent a wrong http status code back when this header was
> enabled, but this has been solved by moving the authentication
> configuration to Apache HTTP server instead (using .htaccess). So
> everything *should* work, but it doesn't. Now I have managed to reproduce
> the problems on our test sever as well when I added full password
> protection for the Solr test server. As I wrote above, the logs does not
> seem to report problems with the Solr server, but the crawled resources
> instead.
> I have added two logs. One from the production server, and another from
> the test server. Log level is set to DEBUG for HttpClient. The prod job
> just stops and hangs, maybe due to a db lock. The test stops with the
> message "Error: Repeated service interruptions - failure processing
> document: null" ("read timed out" in simple history).
> The logs are available here:
> Erlend
> --
> Erlend GarĂ¥sen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
> 31050

View raw message