manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronny Heylen <securaqbere...@gmail.com>
Subject Re: Error: Repeated service interruptions - failure processing document: Read timed out
Date Wed, 06 Nov 2013 20:31:47 GMT
Ok Karl, thanks for the tip and the quick response, we will do this and
come back with the result.


On Wed, Nov 6, 2013 at 9:28 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Ronny,
>
> One minor thing: you should need to set throttling to 2 ONLY for the
> Windows repository connection, not for AD or Solr.
>
>
> As for how to debug this issue, first off you should be looking in the
> manifoldcf.log file (or the equivalent).  You should see WARN messages from
> the shared file connector under most conditions when there's a service
> interruption.  You would probably see "Read timed out" warnings if you
> looked there, since that is what aborted the job run, along with a stack
> trace.  However, that's not going to add much information to the analysis
> at this point.
>
> What might be valuable is to determine whether the problem is happening on
> the Windows side or on the Solr side.  At this point I can't tell.  You
> could, however, create a null output connection, and create  a similar job
> the sends its output there, and see if it completes.  Can you do this and
> get back to me?
>
> Thanks,
> Karl
>
>
>
>
>
> On Wed, Nov 6, 2013 at 3:17 PM, Ronny Heylen <securaqbereusr@gmail.com>wrote:
>
>> Hi,
>> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive with
>> several hundred thousands documents.
>> Doing only one manifoldcf job to index all the drive was always giving
>> some kind of error, therefore to better understand where the problem can
>> be, we made one job to index all *.doc*, another one for *.xls*, another
>> one for *.pdf ...
>> Using the help from the list (thanks!) we set the size limit to 100MB and
>> all jobs succeeds (great) except the one for *.pptx
>> The message is
>> Error: Repeated service interruptions - failure processing document: Read
>> timed out
>> We don't find any error in the log we have searched: solr.log, ...
>> Based on some indications found on Internet, we have set the Throttling
>> max connections setting to 2 (instead of 10) in 3 places:
>> output connection to SOLR
>> authority connection to the Active Directory
>> repository connection to the windows file share
>> But the problem stays the same.
>> We have tried on another machine with SOLR 4.5 and Manifoldcf 1.4, same
>> problem.
>> We can let run the job for all *.PDF, or all *.DOC*, or all *.XLS*
>> without problem, but the same message comes always for *.PPTX.
>> The last time the job stops with the message, it displays (not the same
>> numbers for each run as the windows drive is changing) 56311 documents,
>> with 17466 busy and 38847 processed.
>> As we don't find anything in the log (but probably we don't look at the
>> correct place), we don't know what to do.
>> Thanks for your help,
>> Ronny and Frédéric
>>
>
>

Mime
View raw message