manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Zeng <ze...@hotmail.co.uk>
Subject Re: Job stuck - WorkerThread functions return null
Date Wed, 14 Nov 2018 07:16:45 GMT

Hi Karl,

Thanks a lot for your replay. I didn't change any code in the framework except my own repository
connector.

I found that there five methods which are available to inject document identifiers. Could
you please tell me how I should choose the right way to inject the document identifiers.
 activities.addDocumentReference(documentIdentifier);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, relationshipType);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, relationshipType, dataNames,
dataValues);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, relationshipType, dataNames,
dataValues, originationTime);
 activities.addDocumentReference(documentIdentifier, parentIdentifier, relationshipType, dataNames,
dataValues, originationTime, prereqEventNames);

The way I injected document identifiers is as follows.

activities.addDocumentReference(docUri,documentIdentifier,RELATIONSHIP_CHILD);
docUri is the doc url which is supposed to be fetched, e.g. http://domino_server:80/path/dep1/database_name.nsf/api/data/documents
documentIdentifier is the parent url, e.g. http://domino_server:80/path/dep1/database_name.nsf/api/data/documents/unid/B0F9484E94DEA3204825813E001034E1

I am afraid that there is no full stack trace thrown. I have only got the

new IllegalArgumentException("Unrecognized document identifier: '"+documentIdentifier+"'");

with the following code in the WorkerThread.java(org.apache.manifoldcf.crawler.system). I've
found the document identifier in the table of "jobqueue" and the dochash in the table of "jobqueue"
is matched against the hashcode generated by the hash method.

For some of the document identifiers, previousDocuments.get(documentIdentifierHash) can return
the queued document, but for several document identifier,
previousDocuments.get(documentIdentifierHash) return null.

Could you please give me some indication?

protected IPipelineSpecificationWithVersions computePipelineSpecificationWithVersions(String
documentIdentifierHash,
      String componentIdentifierHash,
      String documentIdentifier)
    {
      QueuedDocument qd = previousDocuments.get(documentIdentifierHash);  // return null.
The problem is here.
      if (qd == null)
        throw new IllegalArgumentException("Unrecognized document identifier: '"+documentIdentifier+"'");
      return new PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
    }

Best wishes,

Cheng




________________________________
From: Karl Wright <daddywri@gmail.com>
Sent: 12 November 2018 18:46
To: user@manifoldcf.apache.org
Subject: Re: Job stuck - WorkerThread functions return null

Hi,
Have you been modifying the framework code?  If so, I really cannot help you.

If you haven't -- it looks like you've got code that is injecting document identifiers that
are incorrect.  But I will need to see a full stack trace to be sure of that.

Thanks,
Karl


On Mon, Nov 12, 2018 at 4:06 AM Cheng Zeng <zengc@hotmail.co.uk<mailto:zengc@hotmail.co.uk>>
wrote:
Hi Karl,

I am developing my own repository where I borrowed some code from the file repository connector.
I use my repository connector to crawling documents from IBM domino system. I managed to retrieve
all the files in the domino, however, when I restart my job to recrawl the database in the
domino, I've got problems with the following code where previousDocuments.get(documentIdentifierHash)
in the WorkerThread.java(org.apache.manifoldcf.crawler.system) return null for some of the
document ids. As a result, the job got stuck with the specific document id.

Could you please tell me how I could fix the problem?

 protected IPipelineSpecificationWithVersions computePipelineSpecificationWithVersions(String
documentIdentifierHash,
      String componentIdentifierHash,
      String documentIdentifier)
    {
      QueuedDocument qd = previousDocuments.get(documentIdentifierHash);  // return null.
The problem is here.
      if (qd == null)
        throw new IllegalArgumentException("Unrecognized document identifier: '"+documentIdentifier+"'");
      return new PipelineSpecificationWithVersions(pipelineSpecification,qd,componentIdentifierHash);
    }


Thanks a lot.

Cheng

Mime
View raw message