manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Clegg <>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Wed, 30 Jan 2013 13:55:12 GMT
On 30 January 2013 13:33, Karl Wright <> wrote:

> So you saw events in the history which correspond to these documents
> and which are of type "Indexation" that say "success"?  If that is the
> case, then the ElasticSearch connector thinks it handed the documents
> successfully to the ElasticSearch server.

Ah, no, the activity is fetch rather than indexation. e.g.

01-30-2013 13:08:16.217 fetch 09026205800698a9 Success 549541 361

I don't see any history entries relating to indexing as a specific
activity in its own right. Sorry, that was probably a red herring, I
don't think it's getting that far.

I just noticed that above all the "service interruption reported"
warnings are some errors like this:

ERROR 2013-01-30 13:44:15,356 (Worker thread '45') - Exception tossed:
        at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex.<init>(
        at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.addOrReplaceDocument(
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(
        at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(
        at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(
        at org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(
        at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(

Sadly there's no description, just a stacktrace.

I know the ES server is visible from the MCF server -- actually
they're the same machine, and it's configured to use as the server URL. And I can go to the command
line on that server and curl that URL successfully.

View raw message