lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
Date Tue, 31 May 2016 22:08:43 GMT
I’ve generally been dropping foreign plugin jars in this dir:
server/solr-webapp/webapp/WEB-INF/lib/
This is because it then gets loaded by the same classloader as Solr itself, which can be useful
if you’re, say, overriding some solr-protected-space method.

If you don’t care about the classloader, I believe you can use whatever dir you want, with
the appropriate bit of solrconfig.xml to load it. Something like:
<lib regex=".*\.jar" dir="${solr.install.dir}/dist"/>


On 5/31/16, 2:13 PM, "John Bickerstaff" <john@johnbickerstaff.com> wrote:

>All --
>
>I'm now attempting to use the hon_lucene_synonyms project from github.
>
>I found the documents that were infered by the dead links on the readme in
>the repository -- however, given that I'm using Solr 5.4.x, I no longer
>have the need to integrate into a war file (as far as I can see).
>
>The suggestion on the readme is that I can drop the hon_lucene_synonyms jar
>file into the $SOLR_HOME directory, but this does not seem to be working -
>I'm getting class not found exceptions.
>
>Does anyone on this list have direct experience with getting this plugin to
>work in Solr 5.x?
>
>Thanks in advance...
>
>On Mon, May 30, 2016 at 6:57 PM, MaryJo Sminkey <mjsminkey@gmail.com> wrote:
>
>> It's been awhile since I installed it so I really can't say. I'm more of a
>> code monkey than a server gal (particularly Linux... I'm amazed I got Solr
>> installed in the first place, LOL!) So I had asked our network guy to look
>> it over recently and see if it looked like I did it okay. He said since it
>> shows up in the list of jars in the Solr admin that it's installed.... if
>> that's not necessarily true, I probably need to point him in the right
>> direction for what else to do since he really doesn't know Solr well
>> either.
>>
>> Mary Jo
>>
>>
>>
>>
>> On Mon, May 30, 2016 at 7:49 PM, John Bickerstaff <
>> john@johnbickerstaff.com>
>> wrote:
>>
>> > Thanks for the comment Mary Jo...
>> >
>> > The error loading the class rings a bell - did you find and follow
>> > instructions for adding that to the WAR file?  I vaguely remember seeing
>> > something about that.
>> >
>> > I'm going to try my own tests on the auto phrasing one..  If I'm
>> > successful, I'll post back.
>> >
>> > On Mon, May 30, 2016 at 3:45 PM, MaryJo Sminkey <mjsminkey@gmail.com>
>> > wrote:
>> >
>> > > This is a very timely discussion for me as well as we're trying to
>> tackle
>> > > the multi term synonym issue as well and have not been able to
>> hon-lucene
>> > > plugin to work, the jar shows up as installed but when we set up the
>> > sample
>> > > request handler it throws this error:
>> > >
>> > >
>> >
>> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
>> > > Error loading class
>> > >
>> >
>> 'com.github.healthonnet.search.SynonymExpandingExtendedDismaxQParserPlugin'
>> > >
>> > > I have tried the auto-phrasing one as well (I did set up a field using
>> > copy
>> > > to configure it on) but when testing it didn't seem to return the
>> > synonyms
>> > > as expected. So gave up on that one too (am willing to give it another
>> > try
>> > > though, that was awhile ago). Would definitely like to hear what other
>> > > people have found works on the latest versions of Solr 5.x and/or 6.
>> Just
>> > > sucks that this issue has never been fixed in the core product such
>> that
>> > > you still need to mess with plugins and patches to get such a basic
>> > > functionality working properly.
>> > >
>> > >
>> > > *Mary Jo Sminkey*
>> > > *Senior ColdFusion Developer*
>> > >
>> > > *CF Webtools*
>> > > You Dream It... We Build It. <https://www.cfwebtools.com/>
>> > > 11204 Davenport Suite 100
>> > > Omaha, Nebraska 68154
>> > > O: 402.408.3733 x128
>> > > E:  maryjo.sminkey@cfwebtools.com
>> > > Skype: maryjos.cfwebtools
>> > >
>> > >
>> > > On Mon, May 30, 2016 at 5:02 PM, John Bickerstaff <
>> > > john@johnbickerstaff.com>
>> > > wrote:
>> > >
>> > > > So I'm looking at the solution mentioned here:
>> > > >
>> > > >
>> > >
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> > > >
>> > > > The thing that's troubling me slightly is that the way it's
>> documented
>> > it
>> > > > seems to be missing a small but important link...
>> > > >
>> > > > What exactly causes the results listed to be returned?
>> > > >
>> > > > Here's my thought process:
>> > > >
>> > > > 1. The entry for /autophrase searchHandler does not specify a default
>> > > > search field.
>> > > > 2. The field type "text_autophrase" is set up as the one with the
>> > > > AutoPhrasingFilterFactory as part of it's indexing
>> > > >
>> > > > There isn't any mention (perhaps because it's too obvious) of the
>> need
>> > to
>> > > > copy or otherwise get data into the "text_autophrase" field at index
>> > > time.
>> > > >
>> > > > There isn't any explicit listing of "text_autophrase" as the default
>> > > search
>> > > > field in the /autophrase search handler
>> > > >
>> > > > There isn't any explicit statement of "df=text_autophrase" in the
>> query
>> > > > statment: [/autophrase?q=New+York]
>> > > >
>> > > > Therefore it seems to me that if someone tries to implement this,
>> > they're
>> > > > going to be disappointed in the results unless they:
>> > > > a. copy or otherwise get ALL the text they're interested in -- into
>> the
>> > > > "text_autophrase" field as part of the schema.xml setup (to happen
at
>> > > index
>> > > > time)
>> > > > b. somehow explicitly declare "text_autophrase" as the default search
>> > > field
>> > > > - either in the searchHandler or wherever else the default field is
>> > > > configured.
>> > > >
>> > > > If anyone out there has done this specific approach - could you
>> > validate
>> > > > whether my thought process is correct and / or if I'm missing
>> > something?
>> > > > Yes - I get that I can set it all up and try - but it's what I don't
>> > > know I
>> > > > don't know that bothers me...
>> > > >
>> > > > On Fri, May 27, 2016 at 11:57 AM, John Bickerstaff <
>> > > > john@johnbickerstaff.com
>> > > > > wrote:
>> > > >
>> > > > > Thank you Steve -- very helpful.
>> > > > >
>> > > > > I can see that whatever implementation I decide to try, some
>> testing
>> > > will
>> > > > > be in order.  If anyone is aware of significant gotchas with
this
>> > > synonym
>> > > > > thing that are not mentioned in the already-listed URLs, please
>> feel
>> > > free
>> > > > > to comment.
>> > > > >
>> > > > > On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sarowe@gmail.com>
>> > wrote:
>> > > > >
>> > > > >> I’m working on addressing problems using multi-term synonyms
at
>> > query
>> > > > >> time in Lucene and Solr.
>> > > > >>
>> > > > >> I recommend these two blogs for understanding the issues
(the
>> second
>> > > one
>> > > > >> was mentioned earlier in this thread):
>> > > > >>
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
>> > > > >> >
>> > > > >> <
>> > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>> > > > >>
>> > > > >> In addition to the already-mentioned projects, there is also:
>> > > > >>
>> > > > >> <https://issues.apache.org/jira/browse/SOLR-5379>
>> > > > >>
>> > > > >> All of these projects try in various ways to work around
the fact
>> > that
>> > > > >> Lucene’s QueryParser splits on whitespace before sending
text to
>> > > > analysis,
>> > > > >> one token at a time, so in a synonym filter, multi-word synonyms
>> can
>> > > > never
>> > > > >> match and add alternatives.  See <
>> > > > >> https://issues.apache.org/jira/browse/LUCENE-2605>, where
I’ve
>> > > posted a
>> > > > >> patch to directly address that problem - note that it’s
still a
>> work
>> > > in
>> > > > >> progress.
>> > > > >>
>> > > > >> Once LUCENE-2605 has been fixed, there is still work to do
getting
>> > > > >> (e)dismax to work with the modified Lucene QueryParser, and
>> > addressing
>> > > > >> problems with how queries are constructed from Lucene’s
>> “sausagized”
>> > > > token
>> > > > >> stream.
>> > > > >>
>> > > > >> --
>> > > > >> Steve
>> > > > >> www.lucidworks.com
>> > > > >>
>> > > > >> > On May 26, 2016, at 2:21 PM, John Bickerstaff <
>> > > > john@johnbickerstaff.com>
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > Thanks Chris --
>> > > > >> >
>> > > > >> > The two projects I'm aware of are:
>> > > > >> >
>> > > > >> > https://github.com/healthonnet/hon-lucene-synonyms
>> > > > >> >
>> > > > >> > and the one referenced from the Lucidworks page here:
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> > > > >> >
>> > > > >> > ... which is here :
>> > > > >> https://github.com/LucidWorks/auto-phrase-tokenfilter
>> > > > >> >
>> > > > >> > Is there anything else out there that you would recommend
I look
>> > at?
>> > > > >> >
>> > > > >> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <
>> > chris@depahelix.com
>> > > >
>> > > > >> wrote:
>> > > > >> >
>> > > > >> >> Chris Morley here, from Wayfair.  (Depahelix = my
domain)
>> > > > >> >>
>> > > > >> >> Suyash Sonawane and I have worked on multiple word
synonyms at
>> > > > Wayfair.
>> > > > >> >> We worked mostly off of Ted Sullivan's work and
also off of
>> some
>> > > > >> >> suggestions from Koorosh Vakhshoori.  We have gotten
to a point
>> > > where
>> > > > >> we
>> > > > >> >> have a more sophisticated internal implementation,
however,
>> we've
>> > > > found
>> > > > >> >> that it is very difficult to make it do what you
want it to do,
>> > and
>> > > > >> also be
>> > > > >> >> sufficiently performant.  Watch out for exceptional
situations
>> > with
>> > > > mm
>> > > > >> >> (minimum should match).
>> > > > >> >>
>> > > > >> >> Trey Grainger (now at Lucidworks) and Simon Hughes
of Dice.com
>> > have
>> > > > >> also
>> > > > >> >> done work in this area.
>> > > > >> >>
>> > > > >> >> It should be very possible to get this kind of thing
working on
>> > > > >> >> SolrCloud.  I haven't tried it yet but I think theoretically,
>> it
>> > > > should
>> > > > >> >> just work.  The synonyms stuff is mostly about doing
things at
>> > > index
>> > > > >> time
>> > > > >> >> and query time.  The index time stuff should translate
to
>> > SolrCloud
>> > > > >> >> directly, while the query time stuff might pose
some issues,
>> but
>> > > > >> probably
>> > > > >> >> not too bad, if there are any issues at all.
>> > > > >> >>
>> > > > >> >> I've had decent luck porting our various plugins
from 4.10.x to
>> > > 5.5.0
>> > > > >> >> because a lot of stuff is just Java, and it still
works within
>> > the
>> > > > >> Jetty
>> > > > >> >> context.
>> > > > >> >>
>> > > > >> >> -Chris.
>> > > > >> >>
>> > > > >> >>
>> > > > >> >>
>> > > > >> >>
>> > > > >> >> ----------------------------------------
>> > > > >> >> From: "John Bickerstaff" <john@johnbickerstaff.com>
>> > > > >> >> Sent: Thursday, May 26, 2016 1:51 PM
>> > > > >> >> To: solr-user@lucene.apache.org
>> > > > >> >> Subject: Re: Solr Cloud and Multi-word Synonyms
::
>> > synonym_edismax
>> > > > >> parser
>> > > > >> >> Hey Jeff (or anyone interested in multi-word synonyms)
here are
>> > > some
>> > > > >> >> potentially interesting links...
>> > > > >> >>
>> > > > >> >> http://wiki.apache.org/solr/QueryParser (search
the page for
>> > > > >> >> synonum_edismax)
>> > > > >> >>
>> > > > >> >>
>> > > https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
>> > > > >> (blog
>> > > > >> >> post about what became the synonym_edissmax Query
Parser)
>> > > > >> >>
>> > > > >> >>
>> > > > >> >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>> > > > >> >>
>> > > > >> >> This last was useful for lots of reasons and contains
links to
>> > > other
>> > > > >> >> interesting, related web pages...
>> > > > >> >>
>> > > > >> >> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <
>> > > > jwartes@whitepages.com>
>> > > > >> >> wrote:
>> > > > >> >>
>> > > > >> >>> Oh, interesting. I've certainty encountered
issues with
>> > multi-word
>> > > > >> >>> synonyms, but I hadn't come across this. If
you end up using
>> it
>> > > > with a
>> > > > >> >>> recent solr verison, I'd be glad to hear your
experience.
>> > > > >> >>>
>> > > > >> >>> I haven't used it, but I am aware of one other
project in this
>> > > vein
>> > > > >> that
>> > > > >> >>> you might be interested in looking at:
>> > > > >> >>> https://github.com/LucidWorks/auto-phrase-tokenfilter
>> > > > >> >>>
>> > > > >> >>>
>> > > > >> >>> On 5/26/16, 9:29 AM, "John Bickerstaff" <
>> > john@johnbickerstaff.com
>> > > >
>> > > > >> >> wrote:
>> > > > >> >>>
>> > > > >> >>>> Ahh - for question #3 I may have spoken
too soon. This line
>> > from
>> > > > the
>> > > > >> >>>> github repository readme suggests a way.
>> > > > >> >>>>
>> > > > >> >>>> Update: We have tested to run with the jar
in $SOLR_HOME/lib
>> as
>> > > > well,
>> > > > >> >> and
>> > > > >> >>>> it works (Jetty).
>> > > > >> >>>>
>> > > > >> >>>> I'll try that and only respond back if that
doesn't work.
>> > > > >> >>>>
>> > > > >> >>>> Questions 1 and 2 still stand of course...
If anyone on the
>> > list
>> > > > has
>> > > > >> >>>> experience in this area...
>> > > > >> >>>>
>> > > > >> >>>> Thanks.
>> > > > >> >>>>
>> > > > >> >>>> On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff
<
>> > > > >> >>> john@johnbickerstaff.com
>> > > > >> >>>>> wrote:
>> > > > >> >>>>
>> > > > >> >>>>> Hi all,
>> > > > >> >>>>>
>> > > > >> >>>>> I'm creating a Solr Cloud that will
index and search medical
>> > > text.
>> > > > >> >>>>> Multi-word synonyms are a pretty important
factor.
>> > > > >> >>>>>
>> > > > >> >>>>> I find that there are some challenges
around multi-word
>> > synonyms
>> > > > >> and I
>> > > > >> >>>>> also found on the wiki that there is
a recommended 3rd-party
>> > > > parser
>> > > > >> >>>>> (synonym_edismax parser) created by
Nolan Lawson and found
>> > here:
>> > > > >> >>>>> https://github.com/healthonnet/hon-lucene-synonyms
>> > > > >> >>>>>
>> > > > >> >>>>> Here's the thing - the instructions
on the github site
>> involve
>> > > > >> >> bringing
>> > > > >> >>>>> the jar file into the war file - which
is not applicable any
>> > > > more...
>> > > > >> >> at
>> > > > >> >>>>> least I think it's not...
>> > > > >> >>>>>
>> > > > >> >>>>> I have three questions:
>> > > > >> >>>>>
>> > > > >> >>>>> 1. Is this still a good solution for
multi-word synonyms
>> (I.e.
>> > > > Solr
>> > > > >> >>> Cloud
>> > > > >> >>>>> doesn't break it in some way)
>> > > > >> >>>>> 2. Is there a tool or plug-in out there
that the
>> contributors
>> > > > would
>> > > > >> >>>>> recommend above this one?
>> > > > >> >>>>> 3. Assuming 1 = yes and 2 = no, can
anyone tell me an
>> updated
>> > > > >> >> procedure
>> > > > >> >>>>> for bringing it in to Solr Cloud (I'm
running 5.4.x)
>> > > > >> >>>>>
>> > > > >> >>>>> Thanks
>> > > > >> >>>>>
>> > > > >> >>>
>> > > > >> >>>
>> > > > >> >>
>> > > > >> >>
>> > > > >> >>
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>

Mime
View raw message