manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ElastiSearch missing doc
Date Fri, 12 Dec 2014 14:14:40 GMT
Hi Kamil,

You are getting a 404 error when ManifoldCF tries to delete a document from
the ElasticSearch index:

>>>>>>
    else if (code == 404)
    {
      setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, "Page not
found: " + response);
      throw new ManifoldCFException("Server/page not found");
    }
<<<<<<

The URL it is using is constructed as follows:

>>>>>>
      String idField = URLEncoder.encode(documentURI);
      HttpDelete method = new HttpDelete(config.getServerLocation() +
          "/" + config.getIndexName() + "/" + config.getIndexType()
          + "/" + idField);
      call(method);
<<<<<<

So there are a number of possibilities.  First possibility is that ES was
down entirely when this job ended, and so document removal requests failed
for a legitimate reason.  Second, it may be that the document in question
has already been deleted, and while this would formerly return a 200 error
code in the version of ES the connector was written for, it now returns a
404.  Finally, maybe the REST API changed so much that it is no longer
possible to delete a document from the index this way.  What version of
ElasticSearch are you using, and can you find REST API documentation for
that version that you could point me at?  Can you do enough research to
find out what should work here?

Thanks,
Karl



On Fri, Dec 12, 2014 at 8:56 AM, Kamil ┼╗yta <kamil.zyta@pwr.edu.pl> wrote:
>
> Hi,
> When I testing ES as indexer some job ends with 'Error: Server/page not
> found'. In ES log I have
> some too big doc exceptions. How this affect job? Full MCF logs:
>
> ERROR 2014-12-12 14:45:24,915 (Document cleanup thread '2') - Exception
> tossed: Server/page not found
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Server/page not
> found
>         at
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.handleResultCode(ElasticSearchConnection.java:234)
>         at
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:203)
>         at
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchDelete.execute(ElasticSearchDelete.java:45)
>         at
> org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.removeDocument(ElasticSearchConnector.java:578)
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2350)
>         at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1059)
>         at
> org.apache.manifoldcf.crawler.system.DocumentCleanupThread.run(DocumentCleanupThread.java:189)
>
> Thanks,
> Kamil
>

Mime
View raw message