lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Fiasco <rufia...@gmail.com>
Subject Re: Responses getting truncated
Date Tue, 25 Aug 2009 21:25:09 GMT
So I whipped up a quick SolrJ client and ran it against the document
that I referenced earlier. When I retrieve the doc and just print its
field/value pairs to stdout it ends like this:

http://brockwine.com/images/output1.png

It appears to be some kind of garbage characters.

-Rupert

On Tue, Aug 25, 2009 at 12:19 PM, Uri Boness<uboness@gmail.com> wrote:
> Hi,
>
> This is a very strange behavior and the fact that it is cause by one
> specific field, again, leads me to believe it's still a data issue. Did you
> try using SolrJ to query the data as well? If the same thing happens when
> using the binary protocol, then it's probably not a data issue. On the other
> hand, if it works fine, then at least you can inspect the data to see where
> things go wrong. Sorry for insisting on that, but I cannot think of anything
> else that can cause this problem.
>
> If anyone else have a better idea, I'm actually very curious to hear about
> it.
>
> Uri
>
> Rupert Fiasco wrote:
>>
>> The text file at:
>>
>> http://brockwine.com/solr.txt
>>
>> Represents one of these truncated responses (this one in XML). It
>> starts out great, then look at the bottom, boom, game over. :)
>>
>> I found this document by first running our bigger search which breaks
>> and then zeroing in a specific broken document by using the rows/start
>> parameters. But there are any unknown number of these "broken"
>> documents - a lot I presume.
>>
>> -Rupert
>>
>> On Tue, Aug 25, 2009 at 9:40 AM, Avlesh Singh<avlesh@gmail.com> wrote:
>>
>>>
>>> Can you copy-paste the source data indexed in this field which causes the
>>> error?
>>>
>>> Cheers
>>> Avlesh
>>>
>>> On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco <rufiasco@gmail.com>
>>> wrote:
>>>
>>>
>>>>
>>>> Using wt=json also yields an invalid document. So after more
>>>> investigation it appears that I can always "break" the response by
>>>> pulling back a specific field via the "fl" parameter. If I leave off a
>>>> field then the response is valid, if I include it then Solr yields an
>>>> invalid document - a truncated document. This happens in any response
>>>> format (xml, json, ruby).
>>>>
>>>> I am using the SolrJ client to add documents to in my index. My field
>>>> is a normal "text" field type and the text itself is the first 1000
>>>> characters of an article.
>>>>
>>>>
>>>>>
>>>>> It can very well be an issue with the data itself. For example, if the
>>>>>
>>>>
>>>> data
>>>>
>>>>>
>>>>> contains un-escaped characters which invalidates the response
>>>>>
>>>>
>>>> When I look at the document in using wt=xml then all XML entities are
>>>> escaped. When I look at it under wt=ruby then all single quotes are
>>>> escaped, same for json, so it appears that all escaping it taking
>>>> place. The core problem seems to be that the document is just
>>>> truncated - it just plain end of files. Jetty's log says its sending
>>>> back an HTTP 200 so all is well.
>>>>
>>>> Any ideas on how I can dig deeper?
>>>>
>>>> Thanks
>>>> -Rupert
>>>>
>>>>
>>>> On Mon, Aug 24, 2009 at 4:31 PM, Uri Boness<uboness@gmail.com> wrote:
>>>>
>>>>>
>>>>> It can very well be an issue with the data itself. For example, if the
>>>>>
>>>>
>>>> data
>>>>
>>>>>
>>>>> contains un-escaped characters which invalidates the response. I don't
>>>>>
>>>>
>>>> know
>>>>
>>>>>
>>>>> much about ruby, but what do you get with wt=json?
>>>>>
>>>>> Rupert Fiasco wrote:
>>>>>
>>>>>>
>>>>>> I am seeing our responses getting truncated if and only if I search
on
>>>>>> our main text field.
>>>>>>
>>>>>> E.g. I just do some basic like
>>>>>>
>>>>>> title_t:arthritis
>>>>>>
>>>>>> Then I get a valid document back. But if I add in our larger text
>>>>>> field:
>>>>>>
>>>>>> title_t:arthritis OR text_t:arthritis
>>>>>>
>>>>>> then the resultant document is NOT valid XML (if using wt=xml) or
Ruby
>>>>>> (using wt=ruby). If I run these through curl on the command its
>>>>>> truncated and if I run the search through the web-based admin panel
>>>>>> then I get an XML parse error.
>>>>>>
>>>>>> This appears to have just started recently and the only thing we
have
>>>>>> done is change our indexer from a PHP one to a Java one, but
>>>>>> functionally they are identical.
>>>>>>
>>>>>> Any thoughts? Thanks in advance.
>>>>>>
>>>>>> - Rupert
>>>>>>
>>>>>>
>>>>>>
>>
>>
>

Mime
View raw message