lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uri Boness <ubon...@gmail.com>
Subject Re: Responses getting truncated
Date Tue, 25 Aug 2009 19:19:11 GMT
Hi,

This is a very strange behavior and the fact that it is cause by one 
specific field, again, leads me to believe it's still a data issue. Did 
you try using SolrJ to query the data as well? If the same thing happens 
when using the binary protocol, then it's probably not a data issue. On 
the other hand, if it works fine, then at least you can inspect the data 
to see where things go wrong. Sorry for insisting on that, but I cannot 
think of anything else that can cause this problem.

If anyone else have a better idea, I'm actually very curious to hear 
about it.

Uri

Rupert Fiasco wrote:
> The text file at:
>
> http://brockwine.com/solr.txt
>
> Represents one of these truncated responses (this one in XML). It
> starts out great, then look at the bottom, boom, game over. :)
>
> I found this document by first running our bigger search which breaks
> and then zeroing in a specific broken document by using the rows/start
> parameters. But there are any unknown number of these "broken"
> documents - a lot I presume.
>
> -Rupert
>
> On Tue, Aug 25, 2009 at 9:40 AM, Avlesh Singh<avlesh@gmail.com> wrote:
>   
>> Can you copy-paste the source data indexed in this field which causes the
>> error?
>>
>> Cheers
>> Avlesh
>>
>> On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco <rufiasco@gmail.com> wrote:
>>
>>     
>>> Using wt=json also yields an invalid document. So after more
>>> investigation it appears that I can always "break" the response by
>>> pulling back a specific field via the "fl" parameter. If I leave off a
>>> field then the response is valid, if I include it then Solr yields an
>>> invalid document - a truncated document. This happens in any response
>>> format (xml, json, ruby).
>>>
>>> I am using the SolrJ client to add documents to in my index. My field
>>> is a normal "text" field type and the text itself is the first 1000
>>> characters of an article.
>>>
>>>       
>>>> It can very well be an issue with the data itself. For example, if the
>>>>         
>>> data
>>>       
>>>> contains un-escaped characters which invalidates the response
>>>>         
>>> When I look at the document in using wt=xml then all XML entities are
>>> escaped. When I look at it under wt=ruby then all single quotes are
>>> escaped, same for json, so it appears that all escaping it taking
>>> place. The core problem seems to be that the document is just
>>> truncated - it just plain end of files. Jetty's log says its sending
>>> back an HTTP 200 so all is well.
>>>
>>> Any ideas on how I can dig deeper?
>>>
>>> Thanks
>>> -Rupert
>>>
>>>
>>> On Mon, Aug 24, 2009 at 4:31 PM, Uri Boness<uboness@gmail.com> wrote:
>>>       
>>>> It can very well be an issue with the data itself. For example, if the
>>>>         
>>> data
>>>       
>>>> contains un-escaped characters which invalidates the response. I don't
>>>>         
>>> know
>>>       
>>>> much about ruby, but what do you get with wt=json?
>>>>
>>>> Rupert Fiasco wrote:
>>>>         
>>>>> I am seeing our responses getting truncated if and only if I search on
>>>>> our main text field.
>>>>>
>>>>> E.g. I just do some basic like
>>>>>
>>>>> title_t:arthritis
>>>>>
>>>>> Then I get a valid document back. But if I add in our larger text field:
>>>>>
>>>>> title_t:arthritis OR text_t:arthritis
>>>>>
>>>>> then the resultant document is NOT valid XML (if using wt=xml) or Ruby
>>>>> (using wt=ruby). If I run these through curl on the command its
>>>>> truncated and if I run the search through the web-based admin panel
>>>>> then I get an XML parse error.
>>>>>
>>>>> This appears to have just started recently and the only thing we have
>>>>> done is change our indexer from a PHP one to a Java one, but
>>>>> functionally they are identical.
>>>>>
>>>>> Any thoughts? Thanks in advance.
>>>>>
>>>>> - Rupert
>>>>>
>>>>>
>>>>>           
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message