manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Manifold CF job hangs
Date Thu, 24 May 2018 06:55:19 GMT
Hi Vinay,

If you know which documents these are, it would be great to get hold of one
of them.

Alternatively, it might be helpful to provide a thread dump of the
ManifoldCF agents process once it's finished all the other documents and is
stuck only on ones that are "hung" inside Tika.  This should prompt a Tika
bug report, and a document and a stack trace would be key for that.

If you can give us a document and a corresponding stack trace, please
create a ticket (https://issues.apache.org/jira), and attach the file to
that.

Thanks,
Karl


On Thu, May 24, 2018 at 2:11 AM VINAY Bengaluru <vinaybs.20@gmail.com>
wrote:

> Hi Karl,
>               We have Manifold CF 2.9.1 setup and job configured to do a
> filesystem crawling followed by  tika parser(Manifold CF one) and then
> posting to Solr Cloud.
> Though the crawling and indexing goes smoothly for most of the files,
> there are a certain files including docs, pdfs which get hung at Tika
> transformation stage. There is no errors in the logs and the History page
> shows the file which is at tika parsing stage.
> Any idea why the job hungs and doesn't come out of the tika transformation
> stage? How could we handle such scenario as we setup a scheduled job for
> continuous crawling?
>
> Thanks and regards,
> Vinay B S
>
>
>
>

Mime
View raw message