manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org>
Subject RE: Documentum job stops on error
Date Mon, 17 Jul 2017 06:46:24 GMT
Thanks Karl for the patch!!!

A minor correction is required on the patch https://issues.apache.org/jira/secure/attachment/12877287/CONNECTORS-1444-2.patch(file:DCTM.java)
else if (dfe.getType() != DocumentumException.TYPE_CORRUPTEDDOCUMENT)
need to be modified to
else if (dfe.getType() == DocumentumException.TYPE_CORRUPTEDDOCUMENT)

After the change its working fine.

Also the observation is these errors(DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN)
are emitted from the org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification
method call. So all the changes on https://issues.apache.org/jira/secure/attachment/12877287/CONNECTORS-1444-2.patch
and DocumentumException.java<https://issues.apache.org/jira/secure/attachment/12877287/CONNECTORS-1444-2.patch%20and%20DocumentumException.java>
file change on https://issues.apache.org/jira/secure/attachment/12877277/CONNECTORS-1444.patch
should be sufficient.

Regards,
Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Friday, July 14, 2017 5:41 PM
To: user@manifoldcf.apache.org
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

Ok, I've attached and committed an additional patch.  Please let me know.

Karl


On Fri, Jul 14, 2017 at 7:54 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Hi Karl,

The patch provided is not working since the error is thrown from org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification

return new DocumentumObjectImpl(objIDfSession,objIDfSession.getObjectByQualification(dql));

Error log as follows:

DfException:: THREAD: RMI TCP Connection(1083)-127.0.0.1; MSG: [DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:
 "Error loading object: invalid string length 0 found in input stream"; ERRORCODE: 100; NEXT:
null
        at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
        at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1370)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1362)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
        at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
        at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
        at com.documentum.fc.client.impl.session.Session.getObject(Session.java:958)
        at com.documentum.fc.client.impl.session.Session.getObjectByQualificationEx(Session.java:1139)
        at com.documentum.fc.client.impl.session.Session.getObjectByQualification(Session.java:1117)
        at com.documentum.fc.client.impl.session.SessionHandle.getObjectByQualification(SessionHandle.java:755)
        at org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.getObjectByQualification(DocumentumImpl.java:334)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
        at sun.rmi.transport.Transport$1.run(Transport.java:200)
        at sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Regards,
Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:daddywri@gmail.com<mailto:daddywri@gmail.com>]
Sent: Friday, July 14, 2017 4:32 PM

To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

I have created a ticket (CONNECTORS-1444) to track this issue, and attached a fix.  I've also
committed the fix to trunk.

The fix is not the code change you have done, but instead introduces a new kind of DocumentumException:
CORRUPTEDDOCUMENT.  This will be thrown whenever permanent document corruption is detected,
and will cause the document to be skipped and not indexed.

The "DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED " error should cause the connector to retry
the document at a later time, so if indeed this is not a permanent error, no special fix should
be required.

Please let me know if the fix I have committed works for you.

Karl



On Fri, Jul 14, 2017 at 5:41 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Hi Karl,

Sorry for not explaining the issue in a detail manner.

(1)   Is it likely to go away or not on a retry;

The DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN error are
not likely to go away on immediate retry.

(2)   Does it substantially impact the ability of ManifoldCF to properly process the document;

The impact is someone need to monitor the indexing and if it gets stopped on these issues,
need to use the restart-minimal to start the indexing again.
(3) Is it generally acceptable to skip ALL documents where the error occurs.
Yes, those errors are occurred for a large number of documents and its tough time for the
user to restart the indexing again. Total documents count - 700000+
DM_OBJECT_E_LOAD_INVALID_STRING_LEN  - 11147
DM_PLATFORM_E_INTEGER_CONVERSION_ERROR  21708
Im not sure whether the occurrences of these issues are common on the documentum / due to
improper documentum configuration/maintenance. We have encountered those errors on a couple
of the documentum instances of lower environments (Not validated on production).

The documentum repository errors DM_PLATFORM_E_INTEGER_CONVERSION_ERROR and DM_OBJECT_E_LOAD_INVALID_STRING_LEN
are of type DfException caused from the getObjectByQualification  method in the org.apache.manifoldcf.crawler.common.DCTM.DocumentumImpl.

We made a fix to print the error on the log(documentum server process) and return null.
    catch (DfException e)
    {

      e.printStackTrace();
      return null;
      //throw new DocumentumException("Documentum error: "+e.getMessage());
    }


On the run() method of the  ProcessDocumentThread inner class on  the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
file,  if did a null check to continue with the document processing.
try
      {
IDocumentumObject object = session.getObjectByQualification("dm_document where i_chronicle_id='"
+ documentIdentifier +
          "' and any r_version_label='CURRENT'");
        if(object!=null) {
…
}
      }
      catch (Throwable e)
      {
        this.exception = e;
      }

The [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED error occurs very rarely due to the document
uploaded is parked in interim BOCS and moved to Repository after a shorter time.
If indexing happens on the gap, the properties will be accessible, but the document content
will not be available that causes the error. The fix is not yet completed.
The code snippet that causes this error is shared below.
The run() method of the  ProcessDocumentThread inner class on  the org.apache.manifoldcf.crawler.connectors.DCTM.DCTM
   try
          {
            strFilePath = object.getFile(objFileTemp.getCanonicalPath());
          }
          catch (DocumentumException dfe)
          {
            // Fetch failed, so log it
            activityStatus = "NOCONTENT";
            activityMessage = dfe.getMessage();
            if (dfe.getType() != DocumentumException.TYPE_NOTALLOWED)
              throw dfe;
            return;
          }

The getFile method on the org.apache.manifoldcf.crawler.common.DCTM.DocumentumObjectImpl

    catch (DfException dfe)
    {
      // Can't decide what to do without looking at the exception text.
      // This is crappy but it's the best we can manage, apparently.
      String errorMessage = dfe.getMessage();
      if (errorMessage.indexOf("[DM_CONTENT_E_CANT_START_PULL]") == -1)
        // Treat it as transient, and retry
        throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
      // It's probably not a transient error.  Report it as an access violation, even though
it
      // may well not be.  We don't have much info as to what's happening.
      throw new DocumentumException(dfe.getMessage(),DocumentumException.TYPE_NOTALLOWED);
    }

The approach to discard uncrawlable documents and continue with the  indexing process is meaningful
rather than stalling it. If you feel it is good to include, kindly do the required coding
exception.

Regards,
Tamizh Kumaran Thamizharasan

From: Karl Wright [mailto:daddywri@gmail.com<mailto:daddywri@gmail.com>]
Sent: Friday, July 14, 2017 12:36 PM
To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: Documentum job stops on error

Hi Tamizh,

For any repository  errors, ManifoldCF needs to know the following:
(1) Is it likely to go away or not on a retry;
(2) Does it substantially impact the ability of ManifoldCF to properly process the document;
(3) Is it generally acceptable to skip ALL documents where the error occurs.

In this case your underlying error seems quite worrying:

[DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error: "The content is temporarily parked on a
BOCS server host. It will be available when it is moved to a permanent storage area."

I could imagine that many or most documents are in fact in that state, in which case nothing
can really be crawled?

I'm happy to make coding exceptions in the Documentum connector for discarding uncrawlable
documents, but only if it makes sense to do that.  Here it is not clear at all that we'd want
to change MCF to throw away all documents with this problem.  It sounds instead like there's
some significant Documentum configuration issue to me.

Thanks,
Karl


On Fri, Jul 14, 2017 at 2:39 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Hi Team,

Below behavior is observed on using ManifoldCF Documentum connector.


•         On any Documentum specific error, the application throws the error and the job
stops abruptly. If there is any specific reason for this approach?

Can we handle these errors by logging the errors, ignoring the document and continue the indexing?


Please find the sample error causing the job to fail.


Documentum error: [DM_PLATFORM_E_INTEGER_CONVERSION_ERROR]error:  "The server was unable to
convert the following string (String Unavailable) to an integer or long."

Caused by: org.apache.manifoldcf.crawler.common.DCTM.DocumentumException: Documentum error:
[DM_OBJECT_E_LOAD_INVALID_STRING_LEN]error:  "Error loading object: invalid string length
0 found in input stream"

Error: Repeated service interruptions - failure processing document: [DM_SYSOBJECT_E_CONTENT_UNAVAILABLE_PARKED]error:
"The content is temporarily parked on a BOCS server host. It will be available when it is
moved to a permanent storage area."


Kindly provide your suggestion on this.

Regards,
Tamizh Kumaran Thamizharasan




Mime
View raw message