lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Dickinson <cliff.dickin...@gmail.com>
Subject Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms
Date Thu, 02 Feb 2017 18:17:42 GMT
Steve and Shawn, thanks for your replies/explanations!

I eagerly await the completion of the Solr JIRA ticket referenced above in
a future release.  Many thanks for addressing this challenge that has had
me banging my head against my desk off and on for the last couple years!

Cliff

On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi Cliff,
>
> The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> problem that prevents SynonymGraphFilter from working: the text fed to your
> query analyzer is first split on whitespace.  So e.g. a query containing
> “United States” will never match multi-word synonym “United States”->”US”,
> since the analyzer will fist see “United” and then, separately, “States”.
>
> I fixed the whitespace splitting problem in the classic Lucene query
> parser in <https://issues.apache.org/jira/browse/LUCENE-2605>.  (Note
> that this is *not* the same as Solr’s standard/“Lucene” query parser, which
> is actually a fork of Lucene’s query parser with added functionality.)
>
> There is a Solr JIRA I’m working on to fix the whitespace splitting
> problem: <https://issues.apache.org/jira/browse/SOLR-9185>.  I hope to
> get it committed in time for inclusion in Solr 6.5.
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 2, 2017, at 9:50 AM, Shawn Heisey <apache@elyograg.org> wrote:
> >
> > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> >> The SynonymGraphFilter API documentation contains the following
> statement
> >> at the end:
> >>
> >> "To get fully correct positional queries when your synonym replacements
> are
> >> multiple tokens, you should instead apply synonyms using this
> TokenFilter
> >> at query time and translate the resulting graph to a TermAutomatonQuery
> >> e.g. using TokenStreamToTermAutomatonQuery."
> >
> > Lucene is a programming API for search.  That documentation is intended
> > for people who are writing Lucene programs.  Those users would be
> > constructing query objects in their own code, so they would most likely
> > know exactly which object needs to be changed to TermAutomatonQuery.
> >
> > Solr is a Lucene program ... and an immensely complicated one.  Many
> > Lucene improvements require changes in the end program for full
> > support.  I suspect that Solr's capability has not been updated to use
> > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > is familiar with this Lucene change and Solr internals can comment.
> >
> > Thanks,
> > Shawn
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message