manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Documentum job stops on error
Date Mon, 17 Jul 2017 09:53:53 GMT
I've attached a third patch to this ticket that should fix both of these
cases.  The patches must be applied in order.

Karl


On Mon, Jul 17, 2017 at 2:46 AM, Tamizh Kumaran Thamizharasan <
tthamizharasan@worldbankgroup.org> wrote:

> Thanks Karl for the patch!!!
>
>
>
> A minor correction is required on the patch https://issues.apache.org/
> jira/secure/attachment/12877287/CONNECTORS-1444-2.patch(file:DCTM.java)
>
> else if (dfe.getType() != DocumentumException.TYPE_CORRUPTEDDOCUMENT)
>
> need to be modified to
>
> else if (dfe.getType() == DocumentumException.TYPE_CORRUPTEDDOCUMENT)
>
>
>
> After the change its working fine.
>
>
>
> Also the observation is these errors(DM_PLATFORM_E_INTEGER_CONVERSION_ERROR
> and DM_OBJECT_E_LOAD_INVALID_STRING_LEN) are emitted from the
> org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification
> method call. So all the changes on https://issues.apache.org/
> jira/secure/attachment/12877287/CONNECTORS-1444-2.patch and
> DocumentumException.java
> <https://issues.apache.org/jira/secure/attachment/12877287/CONNECTORS-1444-2.patch%20and%20DocumentumException.java>
> file change on https://issues.apache.org/jira/secure/attachment/
> 12877277/CONNECTORS-1444.patch should be sufficient.
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, July 14, 2017 5:41 PM
>
> *To:* user@manifoldcf.apache.org
> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
> *Subject:* Re: Documentum job stops on error
>
>
>
> Ok, I've attached and committed an additional patch.  Please let me know.
>
>
>
> Karl
>
>
>
>
>
> On Fri, Jul 14, 2017 at 7:54 AM, Tamizh Kumaran Thamizharasan <
> tthamizharasan@worldbankgroup.org> wrote:
>
> Hi Karl,
>
>
>
> The patch provided is not working since the error is thrown from
> org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.
> getObjectByQualification
>
>
>
> return new DocumentumObjectImpl(objIDfSession,objIDfSession.
> getObjectByQualification(dql));
>
>
>
> Error log as follows:
>
>
>
> DfException:: THREAD: RMI TCP Connection(1083)-127.0.0.1; MSG:
> [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:  "Error loading object:
> invalid string length 0 found in input stream"; ERRORCODE: 100; NEXT: null
>
>         at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.
> newException(DocbaseExceptionMapper.java:57)
>
>         at com.documentum.fc.client.impl.connection.docbase.
> MessageEntry.getException(MessageEntry.java:39)
>
>         at com.documentum.fc.client.impl.connection.docbase.
> DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
>
>         at com.documentum.fc.client.impl.connection.docbase.netwise.
> NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
>
>         at com.documentum.fc.client.impl.connection.docbase.netwise.
> NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
>
>         at com.documentum.fc.client.impl.connection.docbase.
> DocbaseConnection$8.evaluate(DocbaseConnection.java:1370)
>
>         at com.documentum.fc.client.impl.connection.docbase.
> DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129)
>
>         at com.documentum.fc.client.impl.connection.docbase.
> DocbaseConnection.applyForObject(DocbaseConnection.java:1362)
>
>         at com.documentum.fc.client.impl.docbase.DocbaseApi.
> parameterizedFetch(DocbaseApi.java:107)
>
>         at com.documentum.fc.client.impl.objectmanager.
> PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
>
>         at com.documentum.fc.client.impl.objectmanager.
> PersistentDataManager.getData(PersistentDataManager.java:82)
>
>         at com.documentum.fc.client.impl.objectmanager.
> PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:
> 355)
>
>         at com.documentum.fc.client.impl.objectmanager.
> PersistentObjectManager.getObject(PersistentObjectManager.java:311)
>
>         at com.documentum.fc.client.impl.session.Session.getObject(
> Session.java:958)
>
>         at com.documentum.fc.client.impl.session.Session.
> getObjectByQualificationEx(Session.java:1139)
>
>         at com.documentum.fc.client.impl.session.Session.
> getObjectByQualification(Session.java:1117)
>
>         at com.documentum.fc.client.impl.session.SessionHandle.
> getObjectByQualification(SessionHandle.java:755)
>
>         at org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.
> getObjectByQualification(DocumentumImpl.java:334)
>
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at sun.rmi.server.UnicastServerRef.dispatch(
> UnicastServerRef.java:346)
>
>         at sun.rmi.transport.Transport$1.run(Transport.java:200)
>
>         at sun.rmi.transport.Transport$1.run(Transport.java:197)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
>
>         at sun.rmi.transport.tcp.TCPTransport.handleMessages(
> TCPTransport.java:568)
>
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(
> TCPTransport.java:826)
>
>         at sun.rmi.transport.tcp.TCPTransport$
> ConnectionHandler.lambda$run$0(TCPTransport.java:683)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(
> TCPTransport.java:682)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, July 14, 2017 4:32 PM
>
>
> *To:* user@manifoldcf.apache.org
> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
> *Subject:* Re: Documentum job stops on error
>
>
>
> I have created a ticket (CONNECTORS-1444) to track this issue, and
> attached a fix.  I've also committed the fix to trunk.
>
>
>
> The fix is not the code change you have done, but instead introduces a new
> kind of DocumentumException: CORRUPTEDDOCUMENT.  This will be thrown
> whenever permanent document corruption is detected, and will cause the
> document to be skipped and not indexed.
>
>
>
> The "DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED " error should cause the
> connector to retry the document at a later time, so if indeed this is not a
> permanent error, no special fix should be required.
>
>
>
> Please let me know if the fix I have committed works for you.
>
>
>
> Karl
>
>
>
>
>
>
>
> On Fri, Jul 14, 2017 at 5:41 AM, Tamizh Kumaran Thamizharasan <
> tthamizharasan@worldbankgroup.org> wrote:
>
> Hi Karl,
>
>
>
> Sorry for not explaining the issue in a detail manner.
>
> (1)   Is it likely to go away or not on a retry;
>
> The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN
> error are not likely to go away on immediate retry.
>
> (2)   Does it substantially impact the ability of ManifoldCF to properly
> process the document;
>
> The impact is someone need to monitor the indexing and if it gets stopped
> on these issues, need to use the restart-minimal to start the indexing
> again.
>
> (3) Is it generally acceptable to skip ALL documents where the error
> occurs.
>
> Yes, those errors are occurred for a large number of documents and its
> tough time for the user to restart the indexing again. Total documents
> count - 700000+
>
> DM_OBJECT_E_LOAD_INVALID_STRING_LEN  - 11147
>
> DM_PLATFORM_E_INTEGER_CONVERSION_ERROR  21708
>
> Im not sure whether the occurrences of these issues are common on the
> documentum / due to improper documentum configuration/maintenance. We have
> encountered those errors on a couple of the documentum instances of lower
> environments (Not validated on production).
>
>
>
> The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR
> and DM_OBJECT_E_LOAD_INVALID_STRING_LEN are of type DfException caused
> from the getObjectByQualification  method in the
> org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.
>
>
>
> We made a fix to print the error on the log(documentum server process) and
> return null.
>
> *    catch* (DfException e)
>
>     {
>
>
>
>       e.printStackTrace();
>
>       *return* *null*;
>
>       //throw new DocumentumException("Documentum error:
> "+e.getMessage());
>
>     }
>
>
>
>
>
> On the run() method of the  ProcessDocumentThread inner class on  the
> org.apache.manifoldcf.crawler.connectors.DCTM.DCTM file,  if did a null
> check to continue with the document processing.
>
> *try*
>
>       {
>
> IDocumentumObject object = session.getObjectByQualification("dm_document
> where i_chronicle_id='" + documentIdentifier +
>
>           "' and any r_version_label='CURRENT'");
>
>         *if*(object!=*null*) {
>
> …
>
> }
>
>       }
>
>       *catch* (Throwable e)
>
>       {
>
>         *this*.exception = e;
>
>       }
>
>
>
> The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely
> due to the document uploaded is parked in interim BOCS and moved to
> Repository after a shorter time.
>
> If indexing happens on the gap, the properties will be accessible, but the
> document content will not be available that causes the error. The fix is
> not yet completed.
>
> The code snippet that causes this error is shared below.
>
> The run() method of the  ProcessDocumentThread inner class on  the
> org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
>
> *   try*
>
>           {
>
>             strFilePath = object.getFile(objFileTemp.getCanonicalPath());
>
>           }
>
>           *catch* (DocumentumException dfe)
>
>           {
>
>             // Fetch failed, so log it
>
>             activityStatus = "NOCONTENT";
>
>             activityMessage = dfe.getMessage();
>
>             *if* (dfe.getType() != DocumentumException.TYPE_NOTALLOWED)
>
>               *throw* dfe;
>
>             *return*;
>
>           }
>
>
>
> The getFile method on the org.apache.manifoldcf.crawler.common.DCTM.
> DocumentumObjectImpl
>
>
>
>     *catch* (DfException dfe)
>
>     {
>
>       // Can't decide what to do without looking at the exception text.
>
>       // This is crappy but it's the best we can manage, apparently.
>
>       String errorMessage = dfe.getMessage();
>
>       *if* (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1)
>
>         // Treat it as transient, and retry
>
>         *throw* *new* DocumentumException(dfe.getMessage(),
> DocumentumException.TYPE_SERVICEINTERRUPTION);
>
>       // It's probably not a transient error.  Report it as an access
> violation, even though it
>
>       // may well not be.  We don't have much info as to what's happening.
>
>       *throw* *new* DocumentumException(dfe.getMessage(),
> DocumentumException.TYPE_NOTALLOWED);
>
>     }
>
>
>
> The approach to discard uncrawlable documents and continue with the
> indexing process is meaningful rather than stalling it. If you feel it is
> good to include, kindly do the required coding exception.
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Friday, July 14, 2017 12:36 PM
> *To:* user@manifoldcf.apache.org
> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
> *Subject:* Re: Documentum job stops on error
>
>
>
> Hi Tamizh,
>
>
>
> For any repository  errors, ManifoldCF needs to know the following:
>
> (1) Is it likely to go away or not on a retry;
>
> (2) Does it substantially impact the ability of ManifoldCF to properly
> process the document;
>
> (3) Is it generally acceptable to skip ALL documents where the error
> occurs.
>
>
>
> In this case your underlying error seems quite worrying:
>
>
>
> [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is
> temporarily parked on a BOCS server host. It will be available when it is
> moved to a permanent storage area."
>
> I could imagine that many or most documents are in fact in that state, in
> which case nothing can really be crawled?
>
>
>
> I'm happy to make coding exceptions in the Documentum connector for
> discarding uncrawlable documents, but only if it makes sense to do that.
> Here it is not clear at all that we'd want to change MCF to throw away all
> documents with this problem.  It sounds instead like there's some
> significant Documentum configuration issue to me.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan <
> tthamizharasan@worldbankgroup.org> wrote:
>
> Hi Team,
>
>
>
> Below behavior is observed on using ManifoldCF Documentum connector.
>
>
>
> ·         On any Documentum specific error, the application throws the
> error and the job stops abruptly. If there is any specific reason for this
> approach?
>
> Can we handle these errors by logging the errors, ignoring the document
> and continue the indexing?
>
>
>
> Please find the sample error causing the job to fail.
>
>
>
> Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error:  "The
> server was unable to convert the following string (String Unavailable) to
> an integer or long."
>
>
>
> Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException:
> Documentum error: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:  "Error
> loading object: invalid string length 0 found in input stream"
>
>
>
> Error: Repeated service interruptions - failure processing document:
> [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is
> temporarily parked on a BOCS server host. It will be available when it is
> moved to a permanent storage area."
>
>
>
> Kindly provide your suggestion on this.
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>
>
>
>
>
>
>

Mime
View raw message