Hi Karl,

These methods are already in use with the connector in the code where file is need to read and ingest in the output.

(!activities.checkURLIndexable(fileUrl))
(!activities.checkMimeTypeIndexable(contentType))
(!activities.checkDateIndexable(modifiedDate))


But this service crashes after crawling approx 2000 documents.

I think there is some other thing hitting it and creating problem.





On Fri, Aug 24, 2018 at 8:33 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Nikita,

Until you fix your connector, nothing can be done to address your Out Of Memory problem.

The problem is that you are not calling the following IProcessActivity method:

  /** Check whether a document of a specific length is indexable by the currently specified output connector.
  *@param length is the document length.
  *@return true if the document is indexable.
  */
  public boolean checkLengthIndexable(long length)
    throws ManifoldCFException, ServiceInterruption;

Your connector should call this and honor the response.

Thanks,
Karl



On Fri, Aug 24, 2018 at 9:55 AM Nikita Ahuja <nikita@smartshore.nl> wrote:
Hi Karl,

I have checked for the coding error, there is nothing like that as"Allowed Document" is working fine for same code on the other system.

But now main issue being faced is "Shutting down of the ManifoldCF" and it shows "java.lang.OutOfMemoryError: GC overhead limit exceeded" on the system.

Postgresql is being used for Manifoldcf and the memory alloted for the system is very good, but still this issue is faced very frequently. 
Throttling(2) and Worker thread size"45" is also being checked and as per the documentation it is checked for different values.


Please suggest the possible problem area and steps to be taken.

On Mon, Aug 20, 2018 at 7:30 PM, Karl Wright <daddywri@gmail.com> wrote:
Obviously your Allowed Documents filter is somehow causing all documents to be excluded.  Since you have a custom repository connector I would bet there is a coding error in it that is responsible.

Karl


On Mon, Aug 20, 2018 at 8:49 AM Nikita Ahuja <nikita@smartshore.nl> wrote:
Hi Karl,

Thanks for reply. 

I am using in the same sequence. The allowed document is added first and then the Tika Transformation. 




But nothing runs in that scenario. The job simply ends without returning anything in the output.






On Mon, Aug 20, 2018 at 5:36 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi,

You are running out of memory.
Tika's memory consumption is not well defined so you will need to limit the size of documents that reach it.  This is not the same as limiting the size of documents *after* Tika extracts them.

The Allowed Documents transformer therefore should be placed in the pipeline before the Tika Extractor.

"Also it is not compatible with the Allowed Documents and Metadata Adjuster Connectors."

This is a huge red flag.  Why not?

Karl


On Mon, Aug 20, 2018 at 6:47 AM Nikita Ahuja <nikita@smartshore.nl> wrote:
Hi Karl,

There is a custom job executing for Aconex in the ManifoldCF environment. But while executing it is not able to crawl complete set of documents. It crashes in the middle of the execution.

Also it is not compatible with the Allowed Documents and Metadata Adjuster Connectors.

The custom job created is similar to the existing Jira connector in the ManifoldCF.

And it showing this type of error. Please suggest appropriate  steps which needs to be followed to make it smoothly running.



Connect to uk1.aconex.co.uk:443 [uk1.aconex.co.uk/---.---.---.---] failed: Read timed out
agents process ran out of memory - shutting down
agents process ran out of memory - shutting down
agents process ran out of memory - shutting down
agents process ran out of memory - shutting down
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
        at org.apache.manifoldcf.core.database.Database.beginTransaction(Database.java:240)
        at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1361)
        at org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.beginTransaction(DBInterfaceHSQLDB.java:1327)
        at org.apache.manifoldcf.crawler.jobs.JobManager.assessMarkedJobs(JobManager.java:823)
        at org.apache.manifoldcf.crawler.system.AssessmentThread.run(AssessmentThread.java:65)
java.lang.OutOfMemoryError: Java heap space
        at org.apache.pdfbox.pdmodel.graphics.state.PDGraphicsState.clone(PDGraphicsState.java:494)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.saveGraphicsState(PDFStreamEngine.java:898)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showText(PDFStreamEngine.java:721)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.showTextString(PDFStreamEngine.java:587)
        at org.apache.pdfbox.contentstream.operator.text.ShowText.process(ShowText.java:55)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:495)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:469)
        at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:150)
        at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:139)
        at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
        at org.apache.tika.parser.pdf.PDF2XHTML.processPage(PDF2XHTML.java:147)
        at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
        at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:117)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:168)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
        at org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
        at org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
        at org.apache.manifoldcf.crawler.connectors.aconex.AconexSession.fetchAndIndexFile(AconexSession.java:720)
        at org.apache.manifoldcf.crawler.connectors.aconex.AconexRepositoryConnector.processDocuments(AconexRepositoryConnector.java:1194)
        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
[Thread-431] INFO org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@2c0b4c83{HTTP/1.1}{0.0.0.0:8345}
[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@4c03a37{/mcf-api-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-3117653580650249372.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-api-service.war}
[Thread-431] INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.w.WebAppContext@65ae095c{/mcf-authority-service,file:/C:/Users/smartshore/AppData/Local/Temp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-8288503227579256193.dir/webapp/,UNAVAILABLE}{D:\Manifold\apache-manifoldcf-2.8.1\example\.\..\web\war\mcf-authority-service.war}
Connect to uk1.aconex.co.uk:443 [uk1.aconex.co.uk/23.10.35.84] failed: Read timed out
--
Thanks and Regards,
Nikita
United Sources Service Pvt. Ltd.
a "Smartshore" Company
Mobile: +91 99 888 57720

http://www.smartshore.nl



--
Thanks and Regards,
Nikita
United Sources Service Pvt. Ltd.
a "Smartshore" Company
Mobile: +91 99 888 57720

http://www.smartshore.nl



--
Thanks and Regards,
Nikita
United Sources Service Pvt. Ltd.
a "Smartshore" Company
Mobile: +91 99 888 57720

http://www.smartshore.nl



--
Thanks and Regards,
Nikita
United Sources Service Pvt. Ltd.
a "Smartshore" Company
Mobile: +91 99 888 57720

http://www.smartshore.nl