lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Term no longer matches if PositionLengthAttr is set to two
Date Tue, 25 Apr 2017 11:40:17 GMT
Hello,

We have a decompounder and recently implemented the PositionLengthAttribute in it and set
it to 2 for a two-word compound such as drinkwater (drinking water in dutch). The decompounder
runs both at index- and query-time on Solr 6.5.0.

The problem is, q=content_nl:drinkwater no longer returns documents containing drinkwater
when posLenAtt = 2 at query time.

This is Solr's debug output for drinkwater with posLenAtt = 2:

    <str name="rawquerystring">content_nl:drinkwater</str>
    <str name="querystring">content_nl:drinkwater</str>
    <str name="parsedquery">SynonymQuery(Synonym())</str>
    <str name="parsedquery_toString">Synonym()</str>

This is the output where i reverted the decompounder, thus a posLenAtt = 1:

    <str name="rawquerystring">content_nl:drinkwater</str>
    <str name="querystring">content_nl:drinkwater</str>
    <str name="parsedquery">SynonymQuery(Synonym(content_nl:drink content_nl:drinkwater))
content_nl:water</str>
    <str name="parsedquery_toString">Synonym(content_nl:drink content_nl:drinkwater)
content_nl:water</str>

The indexed terms still have posLenAtt = 2, but having a posLenAtt = 2 at query time seems
to be a problem.

Any thoughts on this issue? Is it a bug? Do i not understand PositionLengthAttribute? Why
does it affect term/document matching? At query time but not at index time?

Many thanks,
Markus

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message