lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Clegg <andrew.cl...@gmail.com>
Subject Re: Solr ignoring maxFieldLength?
Date Mon, 26 Oct 2009 14:30:04 GMT


Yep, I just re-indexed it again to make double sure -- same problem
unfortunately.

My solrconfig.xml and schema.xml are attached.

In case you want to see it in action on the same data I've got, I've tarred
up my data and conf directories here:

http://biotext.org.uk/static/solr-issue-example.tar.gz

That should be enough to reproduce it with.

Thanks!

Andrew.


Yonik Seeley-2 wrote:
> 
> Yes, please show us your solrconfig.xml, and verify that you reindexed
> the document after changing maxFieldLength and restarting solr.
> 
> I'll also see if I can reproduce a problem with maxFieldLength being
> ignored.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg <andrew.clegg@gmail.com>
> wrote:
>>
>> Morning,
>>
>> Last week I was having a problem with terms visible in my search results
>> in
>> large documents not causing query hits:
>>
>> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
>>
>> Erick suggested it might be related to maxFieldLength, so I set this to
>> 2147483647 in my solrconfig.xml and reindexed over the weekend.
>>
>> Unfortunately I'm having the same problem now, even though Erick appears
>> to
>> be right! I've narrowed it down to a single document for testing
>> purposes,
>> and I can get it returned by querying for a term near the beginning, but
>> terms near the end cause no hit, and I can even find the point part way
>> through the document, after which, none of the remaining terms seem to
>> cause
>> a hit.
>>
>> The document is about 32000 terms long, most of which is in a single
>> field
>> called related_ids of about 31000 terms. My first thought was that the
>> text
>> was being chopped up into so many tokens that it was going over the
>> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
>> unlikely that 67109 tokens would be generated per term!
>>
>> I've tried undeploying and redeploying the whole web app from Tomcat in
>> case
>> the new maxFieldLength hadn't been read, but no difference. If I go to
>>
>> http://localhost:8080/solr/admin/file/?file=solrconfig.xml
>>
>> I can see
>>
>> <maxFieldLength>2147483647</maxFieldLength>
>>
>> as expected.
>>
>> Does anyone have any more ideas? This could potentially be a showstopper
>> for
>> us as we have quite a few long-ish documents to index. (32K words doesn't
>> seem that long to me, but still...)
>>
>> I've tried it with today's nightly build (2009-10-26) and it makes no
>> difference. If this sounds like a bug, I'll open a JIRA and attach tars
>> of
>> my config and data directories. Any thoughts?
>>
>> Thanks,
>>
>> Andrew.
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
http://www.nabble.com/file/p26060882/solrconfig.xml solrconfig.xml 
http://www.nabble.com/file/p26060882/schema.xml schema.xml 
-- 
View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26060882.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message