manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Timeout values to be configurable
Date Wed, 26 Dec 2012 11:19:24 GMT
Hi Shigeki,

While timeout values into Solr could theoretically be configured as
connection parameters, the timeout values for jCIFS are currently only
settable globally.  Therefore, to make changes configurable by
connection, the jCIFS library needs to change.  I've already
approached the jCIFS developer about changes of this kind, and he was
unreceptive to this request.  Part of the reason is the nature of the
CIFS protocol, which multiplexes many simultaneous requests using the
same connection.  So this cannot be solved in the manner you suggest,
in any case.

Furthermore, on a properly-set-up system, it should be unnecessary to
adjust either jCIFS timeout parameters or Solr timeout parameters.  If
you are consistently getting timeouts from jCIFS, it is a strong sign
you are overloading the Windows servers you are trying to crawl, and
you should take steps immediately to reduce the maximum number of
connections you are trying to crawl with.  Similarly, chronically
exceeding the Solr timeout parameters indicates you are pushing
documents into a Solr that is either insufficiently powered, or has
too few available threads.  Cutting back on the max number of
connections is also indicated here as well.

Since ManifoldCF retries failures, occasional failures due to other
loads on either the Windows servers or on Solr are expected and will
not cause problems.  But chronic failures indicate serious
configuration problems, for which increasing the timeouts is the wrong
solution.  So I hesitate to add features of the kind you request,
unless you can convince me that there is a fundamental reason why it
should be necessary to change these parameters.

Thanks,
Karl


On Wed, Dec 26, 2012 at 2:18 AM, Shigeki Kobayashi
<shigeki.kobayashi3@g.softbank.co.jp> wrote:
>
>
> Hi.
>
> As I have used MCF so far, I've faced timeout error many times while
> crawling and indexing files to Solr.
> I would like to propose to have the following timeout values configurable in
> properties.xml.
>
> Timeout errors often occur depending on files and environments(machines), so
> it would be nice to change
> the timeout value without rebuild the whole source.
>
>
> $MCF_HOME\connectors\solr\connector\src\main\java\org\apache\manifoldcf\agents\output\solr\HttpPoster.java
>
> int responseRetries = 9000;         // Long basic wait: 3 minutes.  This
> will also be added to by a term based on the size of the request.
>
> $MCF_HOME\connectors\jcifs\connector\src\main\java\org\apache\manifoldcf\crawler\connectors\sharedrive\SharedDriveConnector.java
>     System.setProperty("jcifs.smb.client.soTimeout","150000");
>     System.setProperty("jcifs.smb.client.responseTimeout","120000");
>
>
> Regards,
>
>
> Shigeki

Mime
View raw message