lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <d...@apache.org>
Subject WordDelimiterFilter and phrase queries?
Date Thu, 22 Jul 2010 21:40:04 GMT
Hi All,

A question about the WordDelimiterFilter and position increments /
phrase queries:

I have a string like: 3-diphenyl-propanoic

When indexed gets it is broken up into the following tokens:

pos token offset
1 3 0-1
2 diphenyl 2-10
3 propanoic 11-20
3 diphenylpropanoic 2-20

The WordDelimiterFilter has catenateWords set to 1, which causes it to
emit 'diphenylpropanoic'. Note that position for this term is '3'.
(catentateAll is set to 0)

Say someone enters the query string 3-diphenylpropanoic

The query parser I'm using transforms this into a phrase query and the
indexed form is missed because based the positions of the terms '3'
and 'diphenylpropanoic' indicate they are not adjacent?

Is this intended behavior? I expect that the catenated word
'diphenylpropanoic' should have a position of 2 based on the position
of the first term in the concatenation, but perhaps I'm missing
something. This seems to be present in both 1.4.1 and the current
trunk.

- Drew

Mime
View raw message