manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Regarding skip limit
Date Wed, 30 May 2018 12:11:38 GMT
Hi Vinay,

I don't have complete information, but offhand it looks to me like the tar
is being extracted more than once because the ingestion fails and is being
retried.  The retries are happening every 7-8 minutes, which is exactly
what one expects for error retries.

Please note that the number in the right column is different from attempt
to attempt.  Nevertheless, these are all suspiciously near to the same
value.

ManifoldCF does not always immediately abort a job when indexing fails --
it depends on whether the error is likely to resolve on retries or not.
Eventually it will give up but not until many retries have taken place.

The Simple History should have errors of all kinds logged, so you should be
able to see why an individual document failed to index.

Thanks,
Karl


On Wed, May 30, 2018 at 7:01 AM VINAY Bengaluru <vinaybs.20@gmail.com>
wrote:

> Hi,
>         While running a job with file-system repository and solr as an
> output conneciton, with tika transformation in between, we see that a
> tar.gz file is being extracted again and again without going to Solr
> ingestion phase. We are seeing the following in the history screen:
>
> 05-30-2018 10:45:22.659 extract [TikaTransformer] file: /file1.. ...
> Projects/ImageProcessing/Girod/public_package.tar.gz
> OK 3544906667 503767
> 05-30-2018 10:37:11.598 extract [TikaTransformer] file:/file1..
> Projects/ImageProcessing/Girod/public_package.tar.gz
> OK 3544906667 489356
> 05-30-2018 10:28:49.251 extract [TikaTransformer] file: /file1.. ..
> Projects/ImageProcessing/Girod/public_package.tar.gz
> OK 3544906667 501580
> 05-30-2018 10:20:35.719 extract [TikaTransformer] file:/ /file1.. ...
> Projects/ImageProcessing/Girod/public_package.tar.gz
> OK 3544906667 489647
> 05-30-2018 10:12:24.859 extract [TikaTransformer] file: /file1.. ...
> Projects/ImageProcessing/Girod/public_package.tar.gz
> OK 3544906667 489811
> 05-30-2018 10:03:57.290 extract [TikaTransformer] file: /file1.. ...
> Projects/ImageProcessing/Girod/public_package.tar.gz
> Any idea why Mainfold cf tries extraction multiple times? Also can we set
> the limit to terminate a job if it fails at a particular phase a certain
> number of times? For eg., Solr ingestion fails 5 times and the job should
> be terminated by itself.
>
> Thanks and regards,
> Vinay B S
>
>
>

Mime
View raw message