lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Queries on SynonymFilterFactory
Date Fri, 08 May 2015 11:05:02 GMT
Accessing an external service ( such a thesaurus website) per each query,
can slow down your system a lot.
Having the synonyms locally, with the Solr integration is much better.

Cheers

2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:

> The document seems to point to using AutoPhrasingTokenFilter, putting an
> underscore to the multi-term or changing to index time synonyms.
>
> I'm also thinking of putting the synonyms onto a database or query some
> thesaurus website when the using enter the search key, instead of using the
> SynonymFilterFactory.
>
> For this, once user enter a search key, the program will retrieve the list
> of synonyms. Then I'll append the list to the search parameters (ie: q).
> I'll use the boosting relevancy to give the original term a higher boost,
> and the synonyms a lower boost.
>
> Is this a good solution?
>
> Regards,
> Edwin
>  On 8 May 2015 17:40, "Alessandro Benedetti" <benedetti.alex85@gmail.com>
> wrote:
>
> > I found this very interesting article that I think can help in better
> > understanding the problem :
> >
> >
> http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > And this :
> >
> >
> http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/
> >
> > Take a look and let me know !
> >
> > 2015-05-08 10:26 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> >
> > > Thanks for explaining the information.
> > >
> > > Currently I'm only using the comma-separated list of words and only
> using
> > > the synonym filter at query time. I find that when I set expend = true,
> > > there's quite a number of irrelevant results that came back, and this
> > > didn't happen when I set expend = false.
> > >
> > > I've yet to try the lists of words with the symbol "=>" between them.
> I'm
> > > trying to solve the multi-word synonyms too, and I found that enclosing
> > the
> > > multi-word in quotes will solve the issue. But this creates problem and
> > the
> > > original token is not return if I enclose single word in quotes.
> > >
> > > Will using the lists of words with the symbol "=>" between them better
> > than
> > > the comma-separated list of words to cater to the multi-word synonyms?
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > >
> > > On 8 May 2015 at 17:10, Alessandro Benedetti <
> benedetti.alex85@gmail.com
> > >
> > > wrote:
> > >
> > > > Let's explain  little bit better here :
> > > > First of all, the SynonimFilter is a Token Filter, and being a Token
> > > Filter
> > > > it can be part of an Analysis pipeline at Indexing and Query Time.
> > > > As the different type of analysis explicitly explains when the
> > filtering
> > > > happens, let's go to the details of the synonyms.txt.
> > > > This file contains a set of lines, each of them describing a synonym
> > > > policy.
> > > > There are 2 different syntaxes accepted :
> > > >
> > > >
> > > >
> > > >
> > > > *couch,sofa,divanteh => thehuge,ginormous,humungous => largesmall
=>
> > > > tiny,teeny,weeny*
> > > >
> > > >
> > > >    - A comma-separated list of words. If the token matches any of the
> > > >    words, then all the words in the list are substituted, which will
> > > > include
> > > >    the original token.
> > > >
> > > >
> > > >    - Two comma-separated lists of words with the symbol "=>" between
> > > them.
> > > >    If the token matches any word on the left, then the list on the
> > right
> > > is
> > > >    substituted. The original token will not be included unless it is
> > also
> > > > in
> > > >    the list on the right.
> > > >
> > > >
> > > > Related the "expand" param, directly from the official Solr
> > > documentation :
> > > >
> > > > expand: (optional; default: true) If true, a synonym will be expanded
> > to
> > > > all equivalent synonyms. If false, all equivalent synonyms will be
> > > reduced
> > > > to the first in the list.
> > > >
> > > > So, starting from this definition let's answer to your questions:
> > > >
> > > > 1) Related the expand the definition seems quite clear, if anything
> > > strange
> > > > is occurring to you, let me know
> > > > 2) Related your second question, it depends on your synonym.txt file,
> > if
> > > > you are not using the => syntax, you are going to always retrieve all
> > > > the synonyms(
> > > > included the original term)
> > > >
> > > > If you need more info let me know, it can strictly depends how you
> are
> > > > using the filter as well ( indexing ? querying ? both ? )
> > > > Example :
> > > > If you are using the filter only at Indexing time, then using the =>
> > > syntax
> > > > will prevent the user to search for the original token in the
> > synonym.txt
> > > > relation.
> > > > Because it will not appear in the index.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > 2015-05-08 9:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >:
> > > >
> > > > > Hi,
> > > > >
> > > > > Will like to check, for the SynonymFilterFactory, I have the
> > following
> > > in
> > > > > my synonyms.txt:
> > > > >
> > > > > Titanium Dioxides, titanium oxide, pigment
> > > > > pigment, colour, colouring material
> > > > >
> > > > > If I set expend=false, and I search for q=pigment, I will get
> results
> > > > that
> > > > > matches pigment, Titanium Dioxides and titanium oxide. But it will
> > not
> > > > > maches colour and colouring materials, as all equivalent synonyms
> > will
> > > > only
> > > > > matches those first in the list.
> > > > >
> > > > > If I set expend=false, and I search for q=pigment, I'll get results
> > > that
> > > > > matches everything in the list (ie: Titanium Dioxides, titanium
> > oxide,
> > > > > colour, colouring material)
> > > > >
> > > > > Is my understand correct?
> > > > >
> > > > > Also, I will like to check, how come if I search q="pigment"
> > (enclosed
> > > in
> > > > > quotes), I only get matches for Titanium Dioxides and not pigment?
> > > > >
> > > > > Regards,
> > > > > Edwin
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > --------------------------
> > > >
> > > > Benedetti Alessandro
> > > > Visiting card : http://about.me/alessandro_benedetti
> > > >
> > > > "Tyger, tyger burning bright
> > > > In the forests of the night,
> > > > What immortal hand or eye
> > > > Could frame thy fearful symmetry?"
> > > >
> > > > William Blake - Songs of Experience -1794 England
> > > >
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message