lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: Query on multivalue field
Date Wed, 02 Mar 2011 00:13:47 GMT
Each token has a position set on it. So if you index the value "alpha 
beta gamma", it winds up stored in Solr as (sort of, for the way we want 
to look at it)

document1:
     alpha:    position 1
     beta:    position 2
     gamma: postition 3

  If you set the position increment gap large, then after one value in a 
multi-valued field ends, the position increment gap will be added to the 
positions for the next value. Solr doesn't actually internally have much 
of any idea of a multi-valued field, ALL a multi-valued indexed field 
is, is a position increment gap seperating tokens from different 'values'.

So index in a multi-valued field, with position increment gap 10000,  
the values:  ["alpha beta gamma", "aleph bet"], you get kind of like:

document1:
     alpha: 1
     beta: 2
     gamma: 3
     aleph: 10004
     bet: 10005

A large position increment gap, as far as I know and can tell (please 
someone correct me if I'm wrong, I am not a Solr developer) has no 
effect on the size or efficiency of your index on disk.

I am not sure why positionIncrementGap doesn't just default to a very 
large number, to provide behavior that more matches what people expect 
from the idea of a "multi-valued field". So maybe there is some flaw in 
my understanding, that justifies some reason for it not to be this way?

But I set my positionIncrementGap very large, and haven't seen any issues.


On 3/1/2011 5:46 PM, Scott Yeadon wrote:
> The only trick with this is ensuring the searches return the right
> results and don't go across value boundaries. If I set the gap to the
> largest text size we expect (approx 5000 chars) what impact does such a
> large value have (i.e. does Solr physically separate these fragments in
> the index or just apply the figure as part of any query?
>
> Scott.
>
> On 2/03/11 9:01 AM, Ahmet Arslan wrote:
>>> In a multiValued field, call it field1, if I have two
>>> values indexed to
>>> this field, say value 1 = "some text...termA...more text"
>>> and value 2 =
>>> "some text...termB...more text" and do a search such as
>>> field1:(termA termB)
>>> (where<solrQueryParser defaultOperator="AND"/>) I'm
>>> getting a hit
>>> returned even though both terms don't occur within a single
>>> value in the
>>> multiValued field.
>>>
>>> What I'm wondering is if there is a way of applying the
>>> query against
>>> each value of the field rather than against the field in
>>> its entirety.
>>> The reason being is the number of values I want to store is
>>> variable and
>>> I'd like to avoid the use of dynamic fields or
>>> restructuring the index
>>> if possible.
>> Your best bet can be using positionIncrementGap and to issue a phrase query (implicit
AND) with the appropriate slop value.
>>
>> Ff you have positionIncrementGap="100", you can simulate this with using
>> &q=field1:"termA termB"~100
>>
>> http://search-lucene.com/m/Hbdvz1og7D71/
>>
>>
>>
>>
>

Mime
View raw message