manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Thu, 31 Jan 2013 15:24:20 GMT
Please let me know if you see any problems.  I'll fix anything you
find as quickly as I can.

Karl

On Thu, Jan 31, 2013 at 10:19 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
> Great, thanks, I'll give it a try.
>
> On 30 January 2013 18:52, Karl Wright <daddywri@gmail.com> wrote:
>> I just checked in a refactoring to trunk that should improve Elastic
>> Search error reporting significantly.
>>
>> Karl
>>
>>
>> On Wed, Jan 30, 2013 at 9:39 AM, Karl Wright <daddywri@gmail.com> wrote:
>>> I agree that the Elastic Search connector needs far better logging and
>>> error handling.  CONNECTORS-629.
>>>
>>> Karl
>>>
>>> On Wed, Jan 30, 2013 at 9:27 AM, Andrew Clegg <andrew.clegg@gmail.com>
wrote:
>>>> Nailed it with the help of wireshark! Turns out it was my fault -- I
>>>> had set it up to use (i.e. create) an index called DocumentumRoW but
>>>> it turns out ES index names must be all lowercase.
>>>>
>>>> Never knew that before.
>>>>
>>>> Slightly annoyed that ES didn't log that...
>>>>
>>>> Thanks again for your help Karl :-)
>>>>
>>>> My only request on the MCF front would be that it would be nice for
>>>> the output connector to log the actual status code and content of a
>>>> non-successful HTTP response.
>>>>
>>>>
>>>> On 30 January 2013 14:21, Andrew Clegg <andrew.clegg@gmail.com> wrote:
>>>>> That information isn't being recorded in manifoldcf.log unfortunately
>>>>> -- I included all that was there. And there are no exceptions in
>>>>> elasticsearch.log either...
>>>>>
>>>>> I'll try running wireshark to see if I can follow the TCP stream.
>>>>>
>>>>>
>>>>>
>>>>> On 30 January 2013 14:16, Karl Wright <daddywri@gmail.com> wrote:
>>>>>> Ok, ElasticSearch is not happy about something when the document
is
>>>>>> being posted.  The connector is seeing a non-200 HTTP response, and
>>>>>> throwing an exception as a result:
>>>>>>
>>>>>>       if (!checkResultCode(method.getStatusCode()))
>>>>>>         throw new ManifoldCFException(getResultDescription());
>>>>>>
>>>>>> Presumably the exception message in the log tells us what that HTTP
>>>>>> code is, but you did not include that key info.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Wed, Jan 30, 2013 at 9:06 AM, Andrew Clegg <andrew.clegg@gmail.com>
wrote:
>>>>>>> Thanks for all your help Karl!
>>>>>>>
>>>>>>> It's 1.0.1 from the binary distro.
>>>>>>>
>>>>>>> And yes, it says "Connection working" when I view it.
>>>>>>>
>>>>>>> On 30 January 2013 14:03, Karl Wright <daddywri@gmail.com>
wrote:
>>>>>>>> Ok, so let's back up a bit.
>>>>>>>>
>>>>>>>> First, which version of ManifoldCF is this?  I need to know
that
>>>>>>>> before I can interpret the stack trace.
>>>>>>>>
>>>>>>>> Second, what do you see when you view the connection in the
crawler
>>>>>>>> UI?  Does it say "Connection working", or something else,
and if so,
>>>>>>>> what?
>>>>>>>>
>>>>>>>> I've created a ticket for better error reporting in this
connector -
>>>>>>>> it was a contribution and AFAIK the error handling is not
very robust
>>>>>>>> at this point, but I can fix that quickly with your help.
;-)
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> On Wed, Jan 30, 2013 at 8:55 AM, Andrew Clegg <andrew.clegg@gmail.com>
wrote:
>>>>>>>>> On 30 January 2013 13:33, Karl Wright <daddywri@gmail.com>
wrote:
>>>>>>>>>
>>>>>>>>>> So you saw events in the history which correspond
to these documents
>>>>>>>>>> and which are of type "Indexation" that say "success"?
 If that is the
>>>>>>>>>> case, then the ElasticSearch connector thinks it
handed the documents
>>>>>>>>>> successfully to the ElasticSearch server.
>>>>>>>>>
>>>>>>>>> Ah, no, the activity is fetch rather than indexation.
e.g.
>>>>>>>>>
>>>>>>>>> 01-30-2013 13:08:16.217 fetch 09026205800698a9 Success
549541 361
>>>>>>>>>
>>>>>>>>> I don't see any history entries relating to indexing
as a specific
>>>>>>>>> activity in its own right. Sorry, that was probably a
red herring, I
>>>>>>>>> don't think it's getting that far.
>>>>>>>>>
>>>>>>>>> I just noticed that above all the "service interruption
reported"
>>>>>>>>> warnings are some errors like this:
>>>>>>>>>
>>>>>>>>> ERROR 2013-01-30 13:44:15,356 (Worker thread '45') -
Exception tossed:
>>>>>>>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException:
>>>>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnection.call(ElasticSearchConnection.java:97)
>>>>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex.<init>(ElasticSearchIndex.java:138)
>>>>>>>>>         at org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector.addOrReplaceDocument(ElasticSearchConnector.java:322)
>>>>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.addOrReplaceDocument(IncrementalIngester.java:1579)
>>>>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.performIngestion(IncrementalIngester.java:504)
>>>>>>>>>         at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:370)
>>>>>>>>>         at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocument(WorkerThread.java:1652)
>>>>>>>>>         at org.apache.manifoldcf.crawler.connectors.DCTM.DCTM.processDocuments(DCTM.java:1820)
>>>>>>>>>         at org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:423)
>>>>>>>>>         at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:551)
>>>>>>>>>
>>>>>>>>> Sadly there's no description, just a stacktrace.
>>>>>>>>>
>>>>>>>>> I know the ES server is visible from the MCF server --
actually
>>>>>>>>> they're the same machine, and it's configured to use
>>>>>>>>> http://127.0.0.1:9200/ as the server URL. And I can go
to the command
>>>>>>>>> line on that server and curl that URL successfully.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg
>
>
>
> --
>
> http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Mime
View raw message