lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Fiasco <rufia...@gmail.com>
Subject Re: Responses getting truncated
Date Tue, 25 Aug 2009 16:49:17 GMT
The text file at:

http://brockwine.com/solr.txt

Represents one of these truncated responses (this one in XML). It
starts out great, then look at the bottom, boom, game over. :)

I found this document by first running our bigger search which breaks
and then zeroing in a specific broken document by using the rows/start
parameters. But there are any unknown number of these "broken"
documents - a lot I presume.

-Rupert

On Tue, Aug 25, 2009 at 9:40 AM, Avlesh Singh<avlesh@gmail.com> wrote:
> Can you copy-paste the source data indexed in this field which causes the
> error?
>
> Cheers
> Avlesh
>
> On Tue, Aug 25, 2009 at 10:01 PM, Rupert Fiasco <rufiasco@gmail.com> wrote:
>
>> Using wt=json also yields an invalid document. So after more
>> investigation it appears that I can always "break" the response by
>> pulling back a specific field via the "fl" parameter. If I leave off a
>> field then the response is valid, if I include it then Solr yields an
>> invalid document - a truncated document. This happens in any response
>> format (xml, json, ruby).
>>
>> I am using the SolrJ client to add documents to in my index. My field
>> is a normal "text" field type and the text itself is the first 1000
>> characters of an article.
>>
>> > It can very well be an issue with the data itself. For example, if the
>> data
>> > contains un-escaped characters which invalidates the response
>>
>> When I look at the document in using wt=xml then all XML entities are
>> escaped. When I look at it under wt=ruby then all single quotes are
>> escaped, same for json, so it appears that all escaping it taking
>> place. The core problem seems to be that the document is just
>> truncated - it just plain end of files. Jetty's log says its sending
>> back an HTTP 200 so all is well.
>>
>> Any ideas on how I can dig deeper?
>>
>> Thanks
>> -Rupert
>>
>>
>> On Mon, Aug 24, 2009 at 4:31 PM, Uri Boness<uboness@gmail.com> wrote:
>> > It can very well be an issue with the data itself. For example, if the
>> data
>> > contains un-escaped characters which invalidates the response. I don't
>> know
>> > much about ruby, but what do you get with wt=json?
>> >
>> > Rupert Fiasco wrote:
>> >>
>> >> I am seeing our responses getting truncated if and only if I search on
>> >> our main text field.
>> >>
>> >> E.g. I just do some basic like
>> >>
>> >> title_t:arthritis
>> >>
>> >> Then I get a valid document back. But if I add in our larger text field:
>> >>
>> >> title_t:arthritis OR text_t:arthritis
>> >>
>> >> then the resultant document is NOT valid XML (if using wt=xml) or Ruby
>> >> (using wt=ruby). If I run these through curl on the command its
>> >> truncated and if I run the search through the web-based admin panel
>> >> then I get an XML parse error.
>> >>
>> >> This appears to have just started recently and the only thing we have
>> >> done is change our indexer from a PHP one to a Java one, but
>> >> functionally they are identical.
>> >>
>> >> Any thoughts? Thanks in advance.
>> >>
>> >> - Rupert
>> >>
>> >>
>> >
>>
>

Mime
View raw message