lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Etévé <jerome.et...@gmail.com>
Subject Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4
Date Tue, 27 Oct 2009 15:29:02 GMT
Actually here is the difference between the textgen analysis pipeline and our:

For the phrase "ingenieur d'affaire senior" ,
Our pipeline gives right after our tokenizer:

term position 	1	2	3	4
term text 	ingenieur	d	affaire	senior

'd' and 'affaire' are separated as different tokens straight away. Our
filters have no later effect for this phrase.

* The textgen pipeline uses a whitespace tokenizer, so it gives first:
term position 	1	2	3
term text 	ingenieur	d'affaire	senior
term type 	word	word	word
source start,end 	0,9	10,19	20,26

* Then a word delimiter filter splits the token "d'affaire" (and
generate the concatenation):
erm position 	1	2	3	4
term text 	ingenieur	d	affaire	senior
daffaire
term type 	word	word	word	word
word
source start,end 	0,9	10,11	12,19	20,26
10,19


Could you see a reason why title:"d affaire" works with textgen but
not with our type?

Thanks!

Jerome.


2009/10/27 Jérôme Etévé <jerome.eteve@gmail.com>:
> Hum,
>  That's probably because of our own customized types/tokenizers/filters.
>
> I tried reindexing and querying our data using the default solr type
> 'textgen' and it works fine.
>
> I need to investigate which features of the new lucene 2.9 API is not
> implemented in our own tokenizers etc...
>
> Thanks.
>
> Jerome.
>
> 2009/10/27 Yonik Seeley <yonik@lucidimagination.com>:
>> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <jerome.eteve@gmail.com>
wrote:
>>> I don't really get why these two tokens are subsequently put together
>>> in a phrase query.
>>
>> That's the way the Lucene query parser has always worked... phrase
>> queries are made if multiple tokens are produced from one field query.
>>
>>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire"
>>> matches document where title contains "d'affaire" and all is fine.
>>
>> This should not have changed between 1.3 and 1.4...
>> What's the fieldType and it's definition for your title field?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jerome@eteve.net
>



-- 
Jerome Eteve.
http://www.eteve.net
jerome@eteve.net

Mime
View raw message