The solr indexing seems to be working fine on the test host.  I haven't verified that is true on the production host.  The cause of the production host hanging, though, may be the really awful stuffer query plan.  It seems to hang but in fact just gets very very slow.

Can you dump the postgresql schema that is in place on the production machine?  Specifically, I want to see the jobqueue table's indexes.

I do not see any exceptions at all logged either place.  If there's a service interruption, usually a warning log entry is dumped.  Not seeing that though.




On Tue, Apr 23, 2013 at 6:22 AM, Erlend Garåsen <e.f.garasen@usit.uio.no> wrote:

I'm still having problems with web crawling using trunk with updated Http client. It seems that the problems occur when Solr is password protected even though the error messages in my logs indicate a timeout problem. I'm not 100 % sure, but it seems that the problem starts as soon as I'm enabling password protection.

We have struggled a lot with the web crawler in production mode recently, but I thought that we managed to get around these problems when "expect 100 continue" was added to the header (now added in trunk). Then we discovered a Resin bug which sent a wrong http status code back when this header was enabled, but this has been solved by moving the authentication configuration to Apache HTTP server instead (using .htaccess). So everything *should* work, but it doesn't. Now I have managed to reproduce the problems on our test sever as well when I added full password protection for the Solr test server. As I wrote above, the logs does not seem to report problems with the Solr server, but the crawled resources instead.

I have added two logs. One from the production server, and another from the test server. Log level is set to DEBUG for HttpClient. The prod job just stops and hangs, maybe due to a db lock. The test stops with the message "Error: Repeated service interruptions - failure processing document: null" ("read timed out" in simple history).

The logs are available here:
http://folk.uio.no/erlendfg/manifoldcf/

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050