lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Solr ignoring maxFieldLength?
Date Mon, 26 Oct 2009 14:47:08 GMT
Sorry Andrew, this is something that's bitten people before.
search for maxFieldLength and you will see *2* of them in your config
- one for indexDefaults and one for mainIndex.
The one in mainIndex is set at 10000 and hence overrides the one in
indexDefaults.

-Yonik
http://www.lucidimagination.com



On Mon, Oct 26, 2009 at 10:30 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
>
>
> Yep, I just re-indexed it again to make double sure -- same problem
> unfortunately.
>
> My solrconfig.xml and schema.xml are attached.
>
> In case you want to see it in action on the same data I've got, I've tarred
> up my data and conf directories here:
>
> http://biotext.org.uk/static/solr-issue-example.tar.gz
>
> That should be enough to reproduce it with.
>
> Thanks!
>
> Andrew.
>
>
> Yonik Seeley-2 wrote:
>>
>> Yes, please show us your solrconfig.xml, and verify that you reindexed
>> the document after changing maxFieldLength and restarting solr.
>>
>> I'll also see if I can reproduce a problem with maxFieldLength being
>> ignored.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> On Mon, Oct 26, 2009 at 7:11 AM, Andrew Clegg <andrew.clegg@gmail.com>
>> wrote:
>>>
>>> Morning,
>>>
>>> Last week I was having a problem with terms visible in my search results
>>> in
>>> large documents not causing query hits:
>>>
>>> http://www.nabble.com/Result-missing-from-query%2C-but-match-shows-in-Field-Analysis-tool-td26029040.html#a26029351
>>>
>>> Erick suggested it might be related to maxFieldLength, so I set this to
>>> 2147483647 in my solrconfig.xml and reindexed over the weekend.
>>>
>>> Unfortunately I'm having the same problem now, even though Erick appears
>>> to
>>> be right! I've narrowed it down to a single document for testing
>>> purposes,
>>> and I can get it returned by querying for a term near the beginning, but
>>> terms near the end cause no hit, and I can even find the point part way
>>> through the document, after which, none of the remaining terms seem to
>>> cause
>>> a hit.
>>>
>>> The document is about 32000 terms long, most of which is in a single
>>> field
>>> called related_ids of about 31000 terms. My first thought was that the
>>> text
>>> was being chopped up into so many tokens that it was going over the
>>> maxFieldLength anyway, but 2147483647/32000=67109, and it seems very
>>> unlikely that 67109 tokens would be generated per term!
>>>
>>> I've tried undeploying and redeploying the whole web app from Tomcat in
>>> case
>>> the new maxFieldLength hadn't been read, but no difference. If I go to
>>>
>>> http://localhost:8080/solr/admin/file/?file=solrconfig.xml
>>>
>>> I can see
>>>
>>> <maxFieldLength>2147483647</maxFieldLength>
>>>
>>> as expected.
>>>
>>> Does anyone have any more ideas? This could potentially be a showstopper
>>> for
>>> us as we have quite a few long-ish documents to index. (32K words doesn't
>>> seem that long to me, but still...)
>>>
>>> I've tried it with today's nightly build (2009-10-26) and it makes no
>>> difference. If this sounds like a bug, I'll open a JIRA and attach tars
>>> of
>>> my config and data directories. Any thoughts?
>>>
>>> Thanks,
>>>
>>> Andrew.
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26057808.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
> http://www.nabble.com/file/p26060882/solrconfig.xml solrconfig.xml
> http://www.nabble.com/file/p26060882/schema.xml schema.xml
> --
> View this message in context: http://www.nabble.com/Solr-ignoring-maxFieldLength--tp26057808p26060882.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Mime
View raw message