lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Okke Klein (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion
Date Thu, 04 Apr 2013 08:59:16 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621944#comment-13621944
] 

Okke Klein commented on SOLR-4381:
----------------------------------

The terms that are being expanded by the solr.SynonymFilterFactory are also being stemmed.
This is unwanted if you want to expand "MIA" to "missing in action" and not "miss in action".
See [Github issue|https://github.com/healthonnet/hon-lucene-synonyms/issues/14] for details.

 

   
           
                
> Query-time multi-word synonym expansion
> ---------------------------------------
>
>                 Key: SOLR-4381
>                 URL: https://issues.apache.org/jira/browse/SOLR-4381
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>            Reporter: Nolan Lawson
>            Priority: Minor
>              Labels: multi-word, queryparser, synonyms
>             Fix For: 4.3
>
>         Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
caution that index-time synonym expansion should be preferred to query-time synonym expansion,
due to the way multi-word synonyms are treated and how IDF values can be boosted artificially.
But query-time expansion should have huge benefits, given that changes to the synonyms don't
require re-indexing, the index size stays the same, and the IDF values for the documents don't
get permanently altered.
> The proposed solution is to move the synonym expansion logic from the analysis chain
(either query- or index-type) and into a new QueryParser.  See the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is extended,
and synonym expansion is done on-the-fly.  Queries are parsed into a lattice (i.e. all possible
synonym combinations), while individual components of the query are still handled by the EDismaxQParser
itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites
experimentation and improvement.  And I think it fits in well with the merry band of misfit
query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/]
and [the Github page for the code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently fixes SOLR-3390
(highlighting problems with multi-word synonyms) and LUCENE-4499 (better support for multi-word
synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message