lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Steichen <te...@net-frame.com>
Subject Re: Making Solr Indexing Errors Visible
Date Thu, 27 Sep 2018 02:22:19 GMT
Alex,

Please look at my embedded responses to your questions.

Terry


On 09/26/2018 04:57 PM, Alexandre Rafalovitch wrote:
> The challenge here is to figure out exactly what you are doing,
> because the original description could have been 10 different things.
>
> So:
> 1) You are using bin/post command (we just found this out)
No, I said that at the outset.  And repeated it.
> 2) You are indexing a bunch of files (what format? all same or different?)
I also said I was indexing a mixture of pdf and doc files
> 3) You are indexing them into a Schema supposedly ready for those
> files (which one?)
I'm using the managed-schema, the data-driven approach
> 4) You think some of them are not in in Solr (how do you know that?
> how do you know that some are? why do you not know _which_ of the
> files are not indexed?)
I thought I made it very clear (twice) that I find that the list of
indexed files is 10% fewer than those in the directory holding the files
being indexed.  And I said that I don't know which are not getting
indexed because I am not getting error messages.
> 5) You are asking whether the error message should have told you if
> there is a problem with indexing (normally yes, but maybe there are
> some edge cases).
That's my question - why am I not getting error messages.  That's the
whole point of my query to the list.
>
> I've put the questions in brackets. I would focus on looking at
> questions in 4) first as they roughly bisect the problem. But other
> things are important too.
>
> I hope this helps,
>     Alex.
>
>
> On 26 September 2018 at 16:39, Terry Steichen <terry@net-frame.com> wrote:
>> Shawn,
>>
>> To the best of my knowledge, I'm not using SolrJ at all.  Just
>> Solr-out-of-the-box.  In this case, if I understand you below, it
>> "should indicate an error status"
>>
>> But it doesn't.
>>
>> Let me try to clarify a bit - I'm just using bin/post to index the files
>> in a directory.  That indexing process produces a lengthy screen display
>> of files that were indexed.  (I realize this isn't production-quality,
>> but I'm not ready for production just yet, so that should be OK.)
>>
>> But no errors are shown (even though there have to be because the totals
>> indexed is less than the directory totals).
>>
>> Are you saying I can't use post (to verify correct indexing), but that I
>> have to write custom software to accomplish that?
>>
>> And that there's no solr variable I can define that will do a kind of
>> "verbose" to show that?
>>
>> And that such errors will not show up in any of solr's log files?
>>
>> Hard to believe (but what is, is, I guess).
>>
>> Terry
>>
>> On 09/26/2018 03:49 PM, Shawn Heisey wrote:
>>> On 9/26/2018 1:23 PM, Terry Steichen wrote:
>>>> I'm pretty sure this was covered earlier.  But I can't find references
>>>> to it.  The question is how to make indexing errors clear and obvious.
>>> If there's an indexing error and you're NOT using the concurrent
>>> client in SolrJ, the response that Solr returns should indicate an
>>> error status.  ConcurrentUpdateSolrClient gets those errors and
>>> swallows them so the calling program never knows they occurred.
>>>
>>>> (I find that there are maybe 10% more files in a directory than end up
>>>> in the index.  I presume they were indexing errors, but I have no idea
>>>> which ones or what might have caused the error.)  As I recall, Solr's
>>>> post tool doesn't give any errors when indexing.  I (vaguely) recall
>>>> that there's a way (through the logs?) to overcome this and show the
>>>> errors.  Or maybe it's that you have to do the indexing outside of Solr?
>>> The simple post tool is not really meant for production use.  It is a
>>> simple tool for interactive testing.
>>>
>>> I don't see anything in SimplePostTool for changing the program's exit
>>> status when an error is encountered during program operation.  If an
>>> error is encountered during the upload, a message would be logged to
>>> stderr, but you wouldn't be able to rely on the program's exit status
>>> to indicate an error.  To get that, you will need to write the
>>> indexing software.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>


Mime
View raw message