This sounds potentially like a problem in Tika, but in order to be sure I would need a complete stack trace, not just a piece of one.

If it is a Tika issue, it should appear reliably on the same document, again and again.

Is there any way you can crawl ONLY one of the documents that got blocked?  I suspect that when you paused and restarted, you just postponed the problem and it will happen again.


On Mon, May 28, 2018 at 9:50 AM msaunier <> wrote:

Hello Karl,


In Manifoldcf 2.9 for all jobs at the end of the job, several documents, around twenty, remain blocked.

A single error appears and it spam the logs of several gigabytes in a short time which filled the servers :



               at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator( ~[?:?]

               at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators( ~[?:?]

               at org.apache.pdfbox.contentstream.PDFStreamEngine.processTransparencyGroup( ~[?:?]


If I paused the job and start, documents are send and it working. But, if I’m not there, we have problems.


Do you now this problem and do you have a solution ? It’s a bad configuration ?


Thanks you.