manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Exception in the running Custom Job
Date Mon, 20 Aug 2018 12:06:00 GMT
Hi,

You are running out of memory.
Tika's memory consumption is not well defined so you will need to limit the
size of documents that reach it.  This is not the same as limiting the size
of documents *after* Tika extracts them.

The Allowed Documents transformer therefore should be placed in the
pipeline before the Tika Extractor.

"Also it is not compatible with the Allowed Documents and Metadata Adjuster
Connectors."

This is a huge red flag.  Why not?

Karl


On Mon, Aug 20, 2018 at 6:47 AM Nikita Ahuja <nikita@smartshore.nl> wrote:

> Hi Karl,
>
> There is a custom job executing for Aconex in the ManifoldCF environment.
> But while executing it is not able to crawl complete set of documents. It
> crashes in the middle of the execution.
>
> Also it is not compatible with the Allowed Documents and Metadata Adjuster
> Connectors.
>
> The custom job created is similar to the existing Jira connector in the
> ManifoldCF.
>
> And it showing this type of error. Please suggest appropriate  steps which
> needs to be followed to make it smoothly running.
>
>
>
> *Connect to uk1.aconex.co.uk:443 <http://uk1.aconex.co.uk:443>
> [uk1.aconex.co.uk/---.---.---.---
> <http://uk1.aconex.co.uk/---.---.---.--->] failed: Read timed out*
> *agents process ran out of memory - shutting down*
> *agents process ran out of memory - shutting down*
> *agents process ran out of memory - shutting down*
> *agents process ran out of memory - shutting down*
> *java.lang.OutOfMemoryError: Java heap space*
> *java.lang.OutOfMemoryError: Java heap space*
> *java.lang.OutOfMemoryError: Java heap space*
> *        at
> org.apache.manifoldcf.core.database.Database.beginTransaction(Database.java:240)*
> *        at
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1361)*
> *        at
> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1327)*
> *        at
> org.apache.manifoldcf.crawler.jobs.JobManager.assessMarkedJobs(JobManager.java:823)*
> *        at
> org.apache.manifoldcf.crawler.system.AssessmentThread.run(AssessmentThread.java:65)*
> *java.lang.OutOfMemoryError: Java heap space*
> *        at
> org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.clone(PDGraphicsState.java:494)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.saveGraphicsState(PDFStreamEngine.java:898)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:721)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:587)*
> *        at
> org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)*
> *        at
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)*
> *        at
> org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)*
> *        at
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)*
> *        at
> org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)*
> *        at
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)*
> *        at
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)*
> *        at
> org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)*
> *        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)*
> *        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)*
> *        at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)*
> *        at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)*
> *        at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)*
> *        at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)*
> *        at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)*
> *        at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)*
> *        at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)*
> *        at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)*
> *        at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)*
> *        at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)*
> *        at
> org.apache.manifoldcf.crawler.connectors.aconex.AconexSession.fetchAndIndexFile(AconexSession.java:720)*
> *        at
> org.apache.manifoldcf.crawler.connectors.aconex.AconexRepositoryConnector.processDocuments(AconexRepositoryConnector.java:1194)*
> *        at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)*
> *[Thread-431] INFO org.eclipse.jetty.server.ServerConnector - Stopped
> ServerConnector@2c0b4c83{HTTP/1.1}{0.0.0.0:8345 <http://0.0.0.0:8345>}*
> *[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@4c03a37{/mcf-api-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-3117653580650249372.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-api-service.war}*
> *[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler -
> Stopped
> o.e.j.w.WebAppContext@65ae095c{/mcf-authority-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-8288503227579256193.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-authority-service.war}*
> *Connect to uk1.aconex.co.uk:443 <http://uk1.aconex.co.uk:443>
> [uk1.aconex.co.uk/23.10.35.84 <http://uk1.aconex.co.uk/23.10.35.84>]
> failed: Read timed out*
> --
> Thanks and Regards,
> Nikita
> Email: nikita@smartshore.nl
> United Sources Service Pvt. Ltd.
> a "Smartshore" Company
> Mobile: +91 99 888 57720
> http://www.smartshore.nl
>

Mime
View raw message