lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Calderon <calderon....@gmail.com>
Subject Re: Responses getting truncated
Date Fri, 28 Aug 2009 21:14:32 GMT
i had a similar issue with text from past requests showing up, this was 
on 1.3 nightly, i switched to using the lucid build of 1.3 and the 
problem went away, im using a nightly of 1.4 right now also without 
probs, then again your mileage may vary as i also made a bunch of schema 
changes that might have had some effect, it wouldnt hurt to try though


On 08/28/2009 02:04 PM, Rupert Fiasco wrote:
> Firstly, to everyone who has been helping me, thank you very much. All
> this feedback is helping me narrow down these issues.
>
> I deleted the index and re-indexed all the data from scratch and for a
> couple of days we were OK, but now it seems to be erring again.
>
> It happens on different input documents so what was broken before now
> works (documents that were having issues before are OK now, after a
> fresh re-index).
>
> An issue we are seeing now is that an XML response from Solr will
> contain the "tail" of an earlier response, for an example:
>
> http://brockwine.com/solr2.txt
>
> That is a response we are getting from Solr - using the web interface
> for Solr in Firefox, Firefox freaks out because it tries to parse
> that, and of course, its invalid XML, but I can retrieve that via
> curl.
>
> Anyone seeing this before?
>
> In regards to earlier questions:
>
>    
>> i assume you are correct, but you listed several steps of transformation
>> above, are you certian they all work correctly and produce valid UTF-8?
>>      
> Yes, I have looked at the source and contacted the author of the
> conversion library we are using and have verified that if UTF8 goes in
> then UTF8 will come out and UTF8 is definitely going in.
>
> I dont think sending over an actual input document would help because
> it seems to change. Plus, this latest issue appears to be more an
> issue of the last response buffer not clearing or something.
>
> Whats strange is that if I wait a few minutes and reload, then the
> buffer is cleared and I get back a valid response, its intermittent,
> but appears to be happening frequently.
>
> If it matters, we started using LucidGaze for Solr about 10 days ago,
> approximately when these issues started happening (but its hard to say
> if thats an issue because at this same time we switched from a PHP to
> Java indexing client).
>
> Thanks for your patience
>
> -Rupert
>
> On Tue, Aug 25, 2009 at 8:33 PM, Chris
> Hostetter<hossman_lucene@fucit.org>  wrote:
>    
>> : We are running an instance of MediaWiki so the text goes through a
>> : couple of transformations: wiki markup ->  html ->  plain text.
>> : Its at this last step that I take a "snippet" and insert that into Solr.
>>         ...
>> : doc.addField("text_snippet_t", article.getSnippet(1000));
>>
>> ok, well first off: that's the not the field we're you are having problems
>> is it?  if i remember correctly from your previous posts, wasn't the
>> response getting aborted in the middle of the Contents field?
>>
>> : and a maximum of 1K chars if its bigger. I initialized this String
>> : from the DB by using the String constructor where I pass in the
>> : charset/collation
>> :
>> : text = new String(textFromDB, "UTF-8");
>> :
>> : So to the best of my knowledge, accessing a substring of a UTF-8
>> : encoded string should not break up the UTF-8 code point. Is that an
>>
>> i assume you are correct, but you listed several steps of transformation
>> above, are you certian they all work correctly and produce valid UTF-8?
>>
>> this leads back to my suggestion before....
>>
>> :>  Can you put the orriginal (pre solr, pre solrj, raw untouched, etc...)
>> :>  file that this solr doc came from online somewhere?
>> :>
>> :>  What does your *indexing* code look like? ... Can you add some debuging to
>> :>  the SolrJ client when you *add* this doc to print out exactly what those
>> :>  1000 characters are?
>>
>>
>> -Hoss
>>
>>      


Mime
View raw message