manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Wed, 30 Jan 2013 14:39:22 GMT
I agree that the Elastic Search connector needs far better logging and
error handling.  CONNECTORS-629.

Karl

On Wed, Jan 30, 2013 at 9:27 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
> Nailed it with the help of wireshark! Turns out it was my fault -- I
> had set it up to use (i.e. create) an index called DocumentumRoW but
> it turns out ES index names must be all lowercase.
>
> Never knew that before.
>
> Slightly annoyed that ES didn't log that...
>
> Thanks again for your help Karl :-)
>
> My only request on the MCF front would be that it would be nice for
> the output connector to log the actual status code and content of a
> non-successful HTTP response.
>
>
> On 30 January 2013 14:21, Andrew Clegg <andrew.clegg@gmail.com> wrote:
>> That information isn't being recorded in manifoldcf.log unfortunately
>> -- I included all that was there. And there are no exceptions in
>> elasticsearch.log either...
>>
>> I'll try running wireshark to see if I can follow the TCP stream.
>>
>>
>>
>> On 30 January 2013 14:16, Karl Wright <daddywri@gmail.com> wrote:
>>> Ok, ElasticSearch is not happy about something when the document is
>>> being posted.  The connector is seeing a non-200 HTTP response, and
>>> throwing an exception as a result:
>>>
>>>       if (!checkResultCode(method.getStatusCode()))
>>>         throw new ManifoldCFException(getResultDescription());
>>>
>>> Presumably the exception message in the log tells us what that HTTP
>>> code is, but you did not include that key info.
>>>
>>> Karl
>>>
>>> On Wed, Jan 30, 2013 at 9:06 AM, Andrew Clegg <andrew.clegg@gmail.com>
wrote:
>>>> Thanks for all your help Karl!
>>>>
>>>> It's 1.0.1 from the binary distro.
>>>>
>>>> And yes, it says "Connection working" when I view it.
>>>>
>>>> On 30 January 2013 14:03, Karl Wright <daddywri@gmail.com> wrote:
>>>>> Ok, so let's back up a bit.
>>>>>
>>>>> First, which version of ManifoldCF is this?  I need to know that
>>>>> before I can interpret the stack trace.
>>>>>
>>>>> Second, what do you see when you view the connection in the crawler
>>>>> UI?  Does it say "Connection working", or something else, and if so,
>>>>> what?
>>>>>
>>>>> I've created a ticket for better error reporting in this connector -
>>>>> it was a contribution and AFAIK the error handling is not very robust
>>>>> at this point, but I can fix that quickly with your help. ;-)
>>>>>
>>>>> Karl
>>>>>
>>>>> On Wed, Jan 30, 2013 at 8:55 AM, Andrew Clegg <andrew.clegg@gmail.com>
wrote:
>>>>>> On 30 January 2013 13:33, Karl Wright <daddywri@gmail.com>
wrote:
>>>>>>
>>>>>>> So you saw events in the history which correspond to these documents
>>>>>>> and which are of type "Indexation" that say "success"?  If that
is the
>>>>>>> case, then the ElasticSearch connector thinks it handed the documents
>>>>>>> successfully to the ElasticSearch server.
>>>>>>
>>>>>> Ah, no, the activity is fetch rather than indexation. e.g.
>>>>>>
>>>>>> 01-30-2013 13:08:16.217 fetch 09026205800698a9 Success 549541 361
>>>>>>
>>>>>> I don't see any history entries relating to indexing as a specific
>>>>>> activity in its own right. Sorry, that was probably a red herring,
I
>>>>>> don't think it's getting that far.
>>>>>>
>>>>>> I just noticed that above all the "service interruption reported"
>>>>>> warnings are some errors like this:
>>>>>>
>>>>>> ERROR 2013-01-30 13:44:15,356 (Worker thread '45') - Exception tossed:
>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:97)
>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex.<init>(ElasticSearchIndex.java:138)
>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.addOrReplaceDocument(ElasticSearchConnector.java:322)
>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>>>>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>>>>>         at org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(DCTM.java:1820)
>>>>>>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>>>>>
>>>>>> Sadly there's no description, just a stacktrace.
>>>>>>
>>>>>> I know the ES server is visible from the MCF server -- actually
>>>>>> they're the same machine, and it's configured to use
>>>>>> http://127.0.0.1:9200/ as the server URL. And I can go to the command
>>>>>> line on that server and curl that URL successfully.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>>
>>
>>
>> --
>>
>> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Mime
View raw message