lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Bickerstaff <j...@johnbickerstaff.com>
Subject Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
Date Tue, 31 May 2016 22:08:39 GMT
Many thanks Joe!  I'll follow the instructions on the linked webpage.

On Tue, May 31, 2016 at 4:05 PM, Joe Lawson <
jlawson@opensourceconnections.com> wrote:

> The docs are out of date for the synonym_edismax but it does work. Check
> out the tests for working examples. I'll try to update it soon. I've run
> the plugin on Solr 5 and 6, solrcloud and standalone. For running in
> SolrCloud make sure you follow
>
> https://cwiki.apache.org/confluence/display/solr/Adding+Custom+Plugins+in+SolrCloud+Mode
> On May 31, 2016 5:13 PM, "John Bickerstaff" <john@johnbickerstaff.com>
> wrote:
>
> > All --
> >
> > I'm now attempting to use the hon_lucene_synonyms project from github.
> >
> > I found the documents that were infered by the dead links on the readme
> in
> > the repository -- however, given that I'm using Solr 5.4.x, I no longer
> > have the need to integrate into a war file (as far as I can see).
> >
> > The suggestion on the readme is that I can drop the hon_lucene_synonyms
> jar
> > file into the $SOLR_HOME directory, but this does not seem to be working
> -
> > I'm getting class not found exceptions.
> >
> > Does anyone on this list have direct experience with getting this plugin
> to
> > work in Solr 5.x?
> >
> > Thanks in advance...
> >
> > On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsminkey@gmail.com>
> > wrote:
> >
> > > It's been awhile since I installed it so I really can't say. I'm more
> of
> > a
> > > code monkey than a server gal (particularly Linux... I'm amazed I got
> > Solr
> > > installed in the first place, LOL!) So I had asked our network guy to
> > look
> > > it over recently and see if it looked like I did it okay. He said since
> > it
> > > shows up in the list of jars in the Solr admin that it's installed....
> if
> > > that's not necessarily true, I probably need to point him in the right
> > > direction for what else to do since he really doesn't know Solr well
> > > either.
> > >
> > > Mary Jo
> > >
> > >
> > >
> > >
> > > On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
> > > john@johnbickerstaff.com>
> > > wrote:
> > >
> > > > Thanks for the comment Mary Jo...
> > > >
> > > > The error loading the class rings a bell - did you find and follow
> > > > instructions for adding that to the WAR file?  I vaguely remember
> > seeing
> > > > something about that.
> > > >
> > > > I'm going to try my own tests on the auto phrasing one..  If I'm
> > > > successful, I'll post back.
> > > >
> > > > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsminkey@gmail.com
> >
> > > > wrote:
> > > >
> > > > > This is a very timely discussion for me as well as we're trying to
> > > tackle
> > > > > the multi term synonym issue as well and have not been able to
> > > hon-lucene
> > > > > plugin to work, the jar shows up as installed but when we set up
> the
> > > > sample
> > > > > request handler it throws this error:
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> > > > > Error loading class
> > > > >
> > > >
> > >
> >
> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
> > > > >
> > > > > I have tried the auto-phrasing one as well (I did set up a field
> > using
> > > > copy
> > > > > to configure it on) but when testing it didn't seem to return the
> > > > synonyms
> > > > > as expected. So gave up on that one too (am willing to give it
> > another
> > > > try
> > > > > though, that was awhile ago). Would definitely like to hear what
> > other
> > > > > people have found works on the latest versions of Solr 5.x and/or
> 6.
> > > Just
> > > > > sucks that this issue has never been fixed in the core product such
> > > that
> > > > > you still need to mess with plugins and patches to get such a basic
> > > > > functionality working properly.
> > > > >
> > > > >
> > > > > *Mary Jo Sminkey*
> > > > > *Senior ColdFusion Developer*
> > > > >
> > > > > *CF Webtools*
> > > > > You Dream It... We Build It. <https://www.cfwebtools.com/>
> > > > > 11204 Davenport Suite 100
> > > > > Omaha, Nebraska 68154
> > > > > O: 402.408.3733 x128
> > > > > E:  maryjo.sminkey@cfwebtools.com
> > > > > Skype: maryjos.cfwebtools
> > > > >
> > > > >
> > > > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
> > > > > john@johnbickerstaff.com>
> > > > > wrote:
> > > > >
> > > > > > So I'm looking at the solution mentioned here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > > > >
> > > > > > The thing that's troubling me slightly is that the way it's
> > > documented
> > > > it
> > > > > > seems to be missing a small but important link...
> > > > > >
> > > > > > What exactly causes the results listed to be returned?
> > > > > >
> > > > > > Here's my thought process:
> > > > > >
> > > > > > 1. The entry for /autophrase searchHandler does not specify
a
> > default
> > > > > > search field.
> > > > > > 2. The field type "text_autophrase" is set up as the one with
the
> > > > > > AutoPhrasingFilterFactory as part of it's indexing
> > > > > >
> > > > > > There isn't any mention (perhaps because it's too obvious) of
the
> > > need
> > > > to
> > > > > > copy or otherwise get data into the "text_autophrase" field
at
> > index
> > > > > time.
> > > > > >
> > > > > > There isn't any explicit listing of "text_autophrase" as the
> > default
> > > > > search
> > > > > > field in the /autophrase search handler
> > > > > >
> > > > > > There isn't any explicit statement of "df=text_autophrase" in
the
> > > query
> > > > > > statment: [/autophrase?q=New+York]
> > > > > >
> > > > > > Therefore it seems to me that if someone tries to implement
this,
> > > > they're
> > > > > > going to be disappointed in the results unless they:
> > > > > > a. copy or otherwise get ALL the text they're interested in
--
> into
> > > the
> > > > > > "text_autophrase" field as part of the schema.xml setup (to
> happen
> > at
> > > > > index
> > > > > > time)
> > > > > > b. somehow explicitly declare "text_autophrase" as the default
> > search
> > > > > field
> > > > > > - either in the searchHandler or wherever else the default field
> is
> > > > > > configured.
> > > > > >
> > > > > > If anyone out there has done this specific approach - could
you
> > > > validate
> > > > > > whether my thought process is correct and / or if I'm missing
> > > > something?
> > > > > > Yes - I get that I can set it all up and try - but it's what
I
> > don't
> > > > > know I
> > > > > > don't know that bothers me...
> > > > > >
> > > > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
> > > > > > john@johnbickerstaff.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Thank you Steve -- very helpful.
> > > > > > >
> > > > > > > I can see that whatever implementation I decide to try,
some
> > > testing
> > > > > will
> > > > > > > be in order.  If anyone is aware of significant gotchas
with
> this
> > > > > synonym
> > > > > > > thing that are not mentioned in the already-listed URLs,
please
> > > feel
> > > > > free
> > > > > > > to comment.
> > > > > > >
> > > > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sarowe@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > >> I’m working on addressing problems using multi-term
synonyms
> at
> > > > query
> > > > > > >> time in Lucene and Solr.
> > > > > > >>
> > > > > > >> I recommend these two blogs for understanding the issues
(the
> > > second
> > > > > one
> > > > > > >> was mentioned earlier in this thread):
> > > > > > >>
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> > > > > > >> >
> > > > > > >> <
> > > > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
> > > > > > >>
> > > > > > >> In addition to the already-mentioned projects, there
is also:
> > > > > > >>
> > > > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
> > > > > > >>
> > > > > > >> All of these projects try in various ways to work around
the
> > fact
> > > > that
> > > > > > >> Lucene’s QueryParser splits on whitespace before
sending text
> to
> > > > > > analysis,
> > > > > > >> one token at a time, so in a synonym filter, multi-word
> synonyms
> > > can
> > > > > > never
> > > > > > >> match and add alternatives.  See <
> > > > > > >> https://issues.apache.org/jira/browse/LUCENE-2605>,
where
> I’ve
> > > > > posted a
> > > > > > >> patch to directly address that problem - note that
it’s still
> a
> > > work
> > > > > in
> > > > > > >> progress.
> > > > > > >>
> > > > > > >> Once LUCENE-2605 has been fixed, there is still work
to do
> > getting
> > > > > > >> (e)dismax to work with the modified Lucene QueryParser,
and
> > > > addressing
> > > > > > >> problems with how queries are constructed from Lucene’s
> > > “sausagized”
> > > > > > token
> > > > > > >> stream.
> > > > > > >>
> > > > > > >> --
> > > > > > >> Steve
> > > > > > >> www.lucidworks.com
> > > > > > >>
> > > > > > >> > On May 26, 2016, at 2:21 PM, John Bickerstaff
<
> > > > > > john@johnbickerstaff.com>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > Thanks Chris --
> > > > > > >> >
> > > > > > >> > The two projects I'm aware of are:
> > > > > > >> >
> > > > > > >> > https://github.com/healthonnet/hon-lucene-synonyms
> > > > > > >> >
> > > > > > >> > and the one referenced from the Lucidworks page
here:
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > > > > >> >
> > > > > > >> > ... which is here :
> > > > > > >> https://github.com/LucidWorks/auto-phrase-tokenfilter
> > > > > > >> >
> > > > > > >> > Is there anything else out there that you would
recommend I
> > look
> > > > at?
> > > > > > >> >
> > > > > > >> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley
<
> > > > chris@depahelix.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> >> Chris Morley here, from Wayfair.  (Depahelix
= my domain)
> > > > > > >> >>
> > > > > > >> >> Suyash Sonawane and I have worked on multiple
word synonyms
> > at
> > > > > > Wayfair.
> > > > > > >> >> We worked mostly off of Ted Sullivan's work
and also off of
> > > some
> > > > > > >> >> suggestions from Koorosh Vakhshoori.  We have
gotten to a
> > point
> > > > > where
> > > > > > >> we
> > > > > > >> >> have a more sophisticated internal implementation,
however,
> > > we've
> > > > > > found
> > > > > > >> >> that it is very difficult to make it do what
you want it to
> > do,
> > > > and
> > > > > > >> also be
> > > > > > >> >> sufficiently performant.  Watch out for exceptional
> > situations
> > > > with
> > > > > > mm
> > > > > > >> >> (minimum should match).
> > > > > > >> >>
> > > > > > >> >> Trey Grainger (now at Lucidworks) and Simon
Hughes of
> > Dice.com
> > > > have
> > > > > > >> also
> > > > > > >> >> done work in this area.
> > > > > > >> >>
> > > > > > >> >> It should be very possible to get this kind
of thing
> working
> > on
> > > > > > >> >> SolrCloud.  I haven't tried it yet but I think
> theoretically,
> > > it
> > > > > > should
> > > > > > >> >> just work.  The synonyms stuff is mostly about
doing things
> > at
> > > > > index
> > > > > > >> time
> > > > > > >> >> and query time.  The index time stuff should
translate to
> > > > SolrCloud
> > > > > > >> >> directly, while the query time stuff might
pose some
> issues,
> > > but
> > > > > > >> probably
> > > > > > >> >> not too bad, if there are any issues at all.
> > > > > > >> >>
> > > > > > >> >> I've had decent luck porting our various plugins
from
> 4.10.x
> > to
> > > > > 5.5.0
> > > > > > >> >> because a lot of stuff is just Java, and it
still works
> > within
> > > > the
> > > > > > >> Jetty
> > > > > > >> >> context.
> > > > > > >> >>
> > > > > > >> >> -Chris.
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >> ----------------------------------------
> > > > > > >> >> From: "John Bickerstaff" <john@johnbickerstaff.com>
> > > > > > >> >> Sent: Thursday, May 26, 2016 1:51 PM
> > > > > > >> >> To: solr-user@lucene.apache.org
> > > > > > >> >> Subject: Re: Solr Cloud and Multi-word Synonyms
::
> > > > synonym_edismax
> > > > > > >> parser
> > > > > > >> >> Hey Jeff (or anyone interested in multi-word
synonyms) here
> > are
> > > > > some
> > > > > > >> >> potentially interesting links...
> > > > > > >> >>
> > > > > > >> >> http://wiki.apache.org/solr/QueryParser (search
the page
> for
> > > > > > >> >> synonum_edismax)
> > > > > > >> >>
> > > > > > >> >>
> > > > >
> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> > > > > > >> (blog
> > > > > > >> >> post about what became the synonym_edissmax
Query Parser)
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> > > > > > >> >>
> > > > > > >> >> This last was useful for lots of reasons and
contains links
> > to
> > > > > other
> > > > > > >> >> interesting, related web pages...
> > > > > > >> >>
> > > > > > >> >> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes
<
> > > > > > jwartes@whitepages.com>
> > > > > > >> >> wrote:
> > > > > > >> >>
> > > > > > >> >>> Oh, interesting. I've certainty encountered
issues with
> > > > multi-word
> > > > > > >> >>> synonyms, but I hadn't come across this.
If you end up
> using
> > > it
> > > > > > with a
> > > > > > >> >>> recent solr verison, I'd be glad to hear
your experience.
> > > > > > >> >>>
> > > > > > >> >>> I haven't used it, but I am aware of one
other project in
> > this
> > > > > vein
> > > > > > >> that
> > > > > > >> >>> you might be interested in looking at:
> > > > > > >> >>> https://github.com/LucidWorks/auto-phrase-tokenfilter
> > > > > > >> >>>
> > > > > > >> >>>
> > > > > > >> >>> On 5/26/16, 9:29 AM, "John Bickerstaff"
<
> > > > john@johnbickerstaff.com
> > > > > >
> > > > > > >> >> wrote:
> > > > > > >> >>>
> > > > > > >> >>>> Ahh - for question #3 I may have spoken
too soon. This
> line
> > > > from
> > > > > > the
> > > > > > >> >>>> github repository readme suggests
a way.
> > > > > > >> >>>>
> > > > > > >> >>>> Update: We have tested to run with
the jar in
> > $SOLR_HOME/lib
> > > as
> > > > > > well,
> > > > > > >> >> and
> > > > > > >> >>>> it works (Jetty).
> > > > > > >> >>>>
> > > > > > >> >>>> I'll try that and only respond back
if that doesn't work.
> > > > > > >> >>>>
> > > > > > >> >>>> Questions 1 and 2 still stand of course...
If anyone on
> the
> > > > list
> > > > > > has
> > > > > > >> >>>> experience in this area...
> > > > > > >> >>>>
> > > > > > >> >>>> Thanks.
> > > > > > >> >>>>
> > > > > > >> >>>> On Thu, May 26, 2016 at 10:25 AM,
John Bickerstaff <
> > > > > > >> >>> john@johnbickerstaff.com
> > > > > > >> >>>>> wrote:
> > > > > > >> >>>>
> > > > > > >> >>>>> Hi all,
> > > > > > >> >>>>>
> > > > > > >> >>>>> I'm creating a Solr Cloud that
will index and search
> > medical
> > > > > text.
> > > > > > >> >>>>> Multi-word synonyms are a pretty
important factor.
> > > > > > >> >>>>>
> > > > > > >> >>>>> I find that there are some challenges
around multi-word
> > > > synonyms
> > > > > > >> and I
> > > > > > >> >>>>> also found on the wiki that there
is a recommended
> > 3rd-party
> > > > > > parser
> > > > > > >> >>>>> (synonym_edismax parser) created
by Nolan Lawson and
> found
> > > > here:
> > > > > > >> >>>>> https://github.com/healthonnet/hon-lucene-synonyms
> > > > > > >> >>>>>
> > > > > > >> >>>>> Here's the thing - the instructions
on the github site
> > > involve
> > > > > > >> >> bringing
> > > > > > >> >>>>> the jar file into the war file
- which is not applicable
> > any
> > > > > > more...
> > > > > > >> >> at
> > > > > > >> >>>>> least I think it's not...
> > > > > > >> >>>>>
> > > > > > >> >>>>> I have three questions:
> > > > > > >> >>>>>
> > > > > > >> >>>>> 1. Is this still a good solution
for multi-word synonyms
> > > (I.e.
> > > > > > Solr
> > > > > > >> >>> Cloud
> > > > > > >> >>>>> doesn't break it in some way)
> > > > > > >> >>>>> 2. Is there a tool or plug-in
out there that the
> > > contributors
> > > > > > would
> > > > > > >> >>>>> recommend above this one?
> > > > > > >> >>>>> 3. Assuming 1 = yes and 2 = no,
can anyone tell me an
> > > updated
> > > > > > >> >> procedure
> > > > > > >> >>>>> for bringing it in to Solr Cloud
(I'm running 5.4.x)
> > > > > > >> >>>>>
> > > > > > >> >>>>> Thanks
> > > > > > >> >>>>>
> > > > > > >> >>>
> > > > > > >> >>>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message