lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Smiley <david.w.smi...@gmail.com>
Subject Re: Solr 6.4 new SynonymGraphFilter help for multi-word synonyms
Date Fri, 03 Feb 2017 16:48:24 GMT
Solr _does_ have a query parser that doesn't suffer from this problem --
SimpleQParser chosen as the string "simple".
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-SimpleQueryParser
In this case, see the "WHITESPACE" operator feature which can be toggled.
Configure to be _not_ an operator so that whitespace is processed by the
underlying Analyzer to get proper multi-word handling.  This is a very fine
query parser, IMO; much simpler than any other that has it's feature set.
Though you still might need dismax/edismax.

On Thu, Feb 2, 2017 at 1:17 PM Cliff Dickinson <cliff.dickinson@gmail.com>
wrote:

> Steve and Shawn, thanks for your replies/explanations!
>
> I eagerly await the completion of the Solr JIRA ticket referenced above in
> a future release.  Many thanks for addressing this challenge that has had
> me banging my head against my desk off and on for the last couple years!
>
> Cliff
>
> On Thu, Feb 2, 2017 at 1:01 PM, Steve Rowe <sarowe@gmail.com> wrote:
>
> > Hi Cliff,
> >
> > The Solr query parsers (standard/“Lucene” and e/dismax anyway) have a
> > problem that prevents SynonymGraphFilter from working: the text fed to
> your
> > query analyzer is first split on whitespace.  So e.g. a query containing
> > “United States” will never match multi-word synonym “United
> States”->”US”,
> > since the analyzer will fist see “United” and then, separately, “States”.
> >
> > I fixed the whitespace splitting problem in the classic Lucene query
> > parser in <https://issues.apache.org/jira/browse/LUCENE-2605>.  (Note
> > that this is *not* the same as Solr’s standard/“Lucene” query parser,
> which
> > is actually a fork of Lucene’s query parser with added functionality.)
> >
> > There is a Solr JIRA I’m working on to fix the whitespace splitting
> > problem: <https://issues.apache.org/jira/browse/SOLR-9185>.  I hope to
> > get it committed in time for inclusion in Solr 6.5.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Feb 2, 2017, at 9:50 AM, Shawn Heisey <apache@elyograg.org> wrote:
> > >
> > > On 2/2/2017 7:36 AM, Cliff Dickinson wrote:
> > >> The SynonymGraphFilter API documentation contains the following
> > statement
> > >> at the end:
> > >>
> > >> "To get fully correct positional queries when your synonym
> replacements
> > are
> > >> multiple tokens, you should instead apply synonyms using this
> > TokenFilter
> > >> at query time and translate the resulting graph to a
> TermAutomatonQuery
> > >> e.g. using TokenStreamToTermAutomatonQuery."
> > >
> > > Lucene is a programming API for search.  That documentation is intended
> > > for people who are writing Lucene programs.  Those users would be
> > > constructing query objects in their own code, so they would most likely
> > > know exactly which object needs to be changed to TermAutomatonQuery.
> > >
> > > Solr is a Lucene program ... and an immensely complicated one.  Many
> > > Lucene improvements require changes in the end program for full
> > > support.  I suspect that Solr's capability has not been updated to use
> > > this new feature in Lucene.  I cannot say for sure, I hope someone who
> > > is familiar with this Lucene change and Solr internals can comment.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
> >
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message