lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Yeadon <scott.yea...@anu.edu.au>
Subject Re: Query on multivalue field
Date Wed, 02 Mar 2011 00:40:26 GMT
Tested it out and seems to work well as long as I set the gap to a value 
much longer than the text - 10000 appear to work fine for our current 
data. Thanks heaps for all the help guys!

Scott.

On 2/03/11 11:13 AM, Jonathan Rochkind wrote:
> Each token has a position set on it. So if you index the value "alpha 
> beta gamma", it winds up stored in Solr as (sort of, for the way we 
> want to look at it)
>
> document1:
>     alpha:    position 1
>     beta:    position 2
>     gamma: postition 3
>
>  If you set the position increment gap large, then after one value in 
> a multi-valued field ends, the position increment gap will be added to 
> the positions for the next value. Solr doesn't actually internally 
> have much of any idea of a multi-valued field, ALL a multi-valued 
> indexed field is, is a position increment gap seperating tokens from 
> different 'values'.
>
> So index in a multi-valued field, with position increment gap 10000,  
> the values:  ["alpha beta gamma", "aleph bet"], you get kind of like:
>
> document1:
>     alpha: 1
>     beta: 2
>     gamma: 3
>     aleph: 10004
>     bet: 10005
>
> A large position increment gap, as far as I know and can tell (please 
> someone correct me if I'm wrong, I am not a Solr developer) has no 
> effect on the size or efficiency of your index on disk.
>
> I am not sure why positionIncrementGap doesn't just default to a very 
> large number, to provide behavior that more matches what people expect 
> from the idea of a "multi-valued field". So maybe there is some flaw 
> in my understanding, that justifies some reason for it not to be this 
> way?
>
> But I set my positionIncrementGap very large, and haven't seen any 
> issues.
>
>
> On 3/1/2011 5:46 PM, Scott Yeadon wrote:
>> The only trick with this is ensuring the searches return the right
>> results and don't go across value boundaries. If I set the gap to the
>> largest text size we expect (approx 5000 chars) what impact does such a
>> large value have (i.e. does Solr physically separate these fragments in
>> the index or just apply the figure as part of any query?
>>
>> Scott.
>>
>> On 2/03/11 9:01 AM, Ahmet Arslan wrote:
>>>> In a multiValued field, call it field1, if I have two
>>>> values indexed to
>>>> this field, say value 1 = "some text...termA...more text"
>>>> and value 2 =
>>>> "some text...termB...more text" and do a search such as
>>>> field1:(termA termB)
>>>> (where<solrQueryParser defaultOperator="AND"/>) I'm
>>>> getting a hit
>>>> returned even though both terms don't occur within a single
>>>> value in the
>>>> multiValued field.
>>>>
>>>> What I'm wondering is if there is a way of applying the
>>>> query against
>>>> each value of the field rather than against the field in
>>>> its entirety.
>>>> The reason being is the number of values I want to store is
>>>> variable and
>>>> I'd like to avoid the use of dynamic fields or
>>>> restructuring the index
>>>> if possible.
>>> Your best bet can be using positionIncrementGap and to issue a 
>>> phrase query (implicit AND) with the appropriate slop value.
>>>
>>> Ff you have positionIncrementGap="100", you can simulate this with 
>>> using
>>> &q=field1:"termA termB"~100
>>>
>>> http://search-lucene.com/m/Hbdvz1og7D71/
>>>
>>>
>>>
>>>
>>
>


Mime
View raw message