manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject RE: Update on "Two simultaneous web crawls hang"
Date Fri, 25 Apr 2014 19:14:36 GMT
Hi Tom,

If this was manifoldcf 1.5, then your problem is almost certainly the
bandwidth throttling.  The code has a bad bug that enforces throttling that
is 1000 times too slow.  Either adjust the throttle, or apply the patch, or
upgrade to mcf 1.6.


Sent from my Windows Phone
From: Tom Rees
Sent: 4/25/2014 2:23 PM
To: user; Vasant Kumar; Steven Bennett; Amy Jocefczyk-Papa
Subject: Update on "Two simultaneous web crawls hang"

In an email on April 4, 2014 I reported that whenever I run two
simultaneous web crawls in ManifoldCF that both crawls will simply stop
progressing after a short period of time. When I look at the thread stack
traces I see that there are many fetcher threads that are waiting forever
in wait() calls.

I am still having that same problem, but I have tried different
configurations. First, in the previous email the tests used Postgres 9.3.2.
I tried using Postgres 9.1.0, but the web crawls hang in the same way.
However, it seems like the web crawls took slightly longer to stop making
progress. Also, I have tried crawling different web sites, and I also tried
using a custom output connector that only saves the downloaded files to the
local file system. The problem persists. Also, another developer at our
site has this same problem whenever he runs multiple web crawls. Is this a
known issue with the web crawler?

Tom Rees

View raw message