manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Clegg <andrew.cl...@gmail.com>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Wed, 30 Jan 2013 14:06:18 GMT
Thanks for all your help Karl!

It's 1.0.1 from the binary distro.

And yes, it says "Connection working" when I view it.

On 30 January 2013 14:03, Karl Wright <daddywri@gmail.com> wrote:
> Ok, so let's back up a bit.
>
> First, which version of ManifoldCF is this?  I need to know that
> before I can interpret the stack trace.
>
> Second, what do you see when you view the connection in the crawler
> UI?  Does it say "Connection working", or something else, and if so,
> what?
>
> I've created a ticket for better error reporting in this connector -
> it was a contribution and AFAIK the error handling is not very robust
> at this point, but I can fix that quickly with your help. ;-)
>
> Karl
>
> On Wed, Jan 30, 2013 at 8:55 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
>> On 30 January 2013 13:33, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> So you saw events in the history which correspond to these documents
>>> and which are of type "Indexation" that say "success"?  If that is the
>>> case, then the ElasticSearch connector thinks it handed the documents
>>> successfully to the ElasticSearch server.
>>
>> Ah, no, the activity is fetch rather than indexation. e.g.
>>
>> 01-30-2013 13:08:16.217 fetch 09026205800698a9 Success 549541 361
>>
>> I don't see any history entries relating to indexing as a specific
>> activity in its own right. Sorry, that was probably a red herring, I
>> don't think it's getting that far.
>>
>> I just noticed that above all the "service interruption reported"
>> warnings are some errors like this:
>>
>> ERROR 2013-01-30 13:44:15,356 (Worker thread '45') - Exception tossed:
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:97)
>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex.<init>(ElasticSearchIndex.java:138)
>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.addOrReplaceDocument(ElasticSearchConnector.java:322)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>         at org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(DCTM.java:1820)
>>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>
>> Sadly there's no description, just a stacktrace.
>>
>> I know the ES server is visible from the MCF server -- actually
>> they're the same machine, and it's configured to use
>> http://127.0.0.1:9200/ as the server URL. And I can go to the command
>> line on that server and curl that URL successfully.



-- 

http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Mime
View raw message