lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: docValues: Can we apply synonym
Date Sat, 30 May 2015 14:38:08 GMT
What I'm suggesting is that you have two fields, one for searching, one
for faceting.

You may find you can't use docValues for your field type, in which case
Solr will just use caches to improve faceting performance.

Upayavira

On Sat, May 30, 2015, at 01:50 AM, Aman Tandon wrote:
> Hi Upayavira,
> 
> How the copyField will help in my scenario when I have to add the synonym
> in docValue enable field.
> 
> With Regards
> Aman Tandon
> 
> On Sat, May 30, 2015 at 1:18 AM, Upayavira <uv@odoko.co.uk> wrote:
> 
> > Use copyField to clone the field for faceting purposes.
> >
> > Upayavira
> >
> > On Fri, May 29, 2015, at 08:06 PM, Aman Tandon wrote:
> > > Hi Erick,
> > >
> > > Thanks for suggestion, We are this query parser plugin (
> > > *SynonymExpandingExtendedDismaxQParserPlugin*) to manage multi-word
> > > synonym. So it does work slower than edismax that's why it is not in
> > > contrib right? (I am asking this question because we are using for all
> > > our
> > > searches to handle 10 multiword ice cube, icecube etc)
> > >
> > > *Moreover I thought a solution for this docValue problem*
> > >
> > > I need to make city field as *multivalued* and by this I mean i will add
> > > the synonym (*mumbai, bombay*) as an extra value to that field if
> > > present.
> > > Now searching operation will work fine as before.
> > >
> > > >
> > > > *<field name="city">mumbai</field><field name="city">bombay</field>*
> > >
> > >
> > > The only prob is if we have to remove the 'city alias/synonym facets'
> > > when
> > > we are providing results to the clients.
> > >
> > > *mumbai, 1000*
> > >
> > >
> > > With Regards
> > > Aman Tandon
> > >
> > > On Fri, May 29, 2015 at 7:26 PM, Erick Erickson <erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > Do take time for performance testing with that parser. It can be slow
> > > > depending on your
> > > > data as I remember. That said it solves the problem it set out to
> > > > solve so if it meets
> > > > your SLAs, it can be a life-saver.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Fri, May 29, 2015 at 2:35 AM, Alessandro Benedetti
> > > > <benedetti.alex85@gmail.com> wrote:
> > > > > Even if a little bit outdated, that query parser is really really
> > cool to
> > > > > manage synonyms !
> > > > > +1 !
> > > > >
> > > > > 2015-05-29 1:01 GMT+01:00 Aman Tandon <amantandon.10@gmail.com>:
> > > > >
> > > > >> Thanks chris.
> > > > >>
> > > > >> Yes we are using it for handling multiword synonym problem.
> > > > >>
> > > > >> With Regards
> > > > >> Aman Tandon
> > > > >>
> > > > >> On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles <
> > > > >> Charles.Reitzel@tiaa-cref.org> wrote:
> > > > >>
> > > > >> > Again, I would recommend using Nolan Lawson's
> > > > >> > SynonymExpandingExtendedDismaxQParserPlugin.
> > > > >> >
> > > > >> >
> > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > > > >> > Sent: Wednesday, May 27, 2015 6:42 PM
> > > > >> > To: solr-user@lucene.apache.org
> > > > >> > Subject: Re: docValues: Can we apply synonym
> > > > >> >
> > > > >> > Ok and what synonym processor you is talking about maybe
it could
> > > > help ?
> > > > >> >
> > > > >> > With Regards
> > > > >> > Aman Tandon
> > > > >> >
> > > > >> > On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles <
> > > > >> > Charles.Reitzel@tiaa-cref.org> wrote:
> > > > >> >
> > > > >> > > Sorry, my bad.   The synonym processor I mention works
> > differently.
> > > > >> It's
> > > > >> > > an extension of the EDisMax query processor and doesn't
require
> > > > field
> > > > >> > > level synonym configs.
> > > > >> > >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Reitzel, Charles [mailto:Charles.Reitzel@tiaa-cref.org]
> > > > >> > > Sent: Wednesday, May 27, 2015 6:12 PM
> > > > >> > > To: solr-user@lucene.apache.org
> > > > >> > > Subject: RE: docValues: Can we apply synonym
> > > > >> > >
> > > > >> > > But the query analysis isn't on a specific field, it
is applied
> > to
> > > > the
> > > > >> > > query string.
> > > > >> > >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > > > >> > > Sent: Wednesday, May 27, 2015 6:08 PM
> > > > >> > > To: solr-user@lucene.apache.org
> > > > >> > > Subject: Re: docValues: Can we apply synonym
> > > > >> > >
> > > > >> > > Hi Charles,
> > > > >> > >
> > > > >> > > The problem here is that the docValues works only with
> > primitives
> > > > data
> > > > >> > > type only like String, int, etc So how could we apply
synonym on
> > > > >> > > primitive data type.
> > > > >> > >
> > > > >> > > With Regards
> > > > >> > > Aman Tandon
> > > > >> > >
> > > > >> > > On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles <
> > > > >> > > Charles.Reitzel@tiaa-cref.org> wrote:
> > > > >> > >
> > > > >> > > > Is there any reason you cannot apply the synonyms
at query
> > time?
> > > > >> > > >  Applying synonyms at indexing time has problems,
e.g.
> > polluting
> > > > the
> > > > >> > > > term frequency for synonyms added, preventing
distance
> > queries,
> > > > ...
> > > > >> > > >
> > > > >> > > > Since city names often have multiple terms, e.g.
New York, Den
> > > > >> > > > Hague, etc., I would recommend using Nolan Lawson's
> > > > >> > > > SynonymExpandingExtendedDismaxQParserPlugin. 
 Tastes great,
> > less
> > > > >> > > filling.
> > > > >> > > >
> > > > >> > > >
> > > > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> > > > >> > > >
> > > > >> > > > We found this to fix synonyms like "ny" for "New
York" and
> > vice
> > > > >> versa.
> > > > >> > > > Haven't tried it with docValues, tho.
> > > > >> > > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Aman Tandon [mailto:amantandon.10@gmail.com]
> > > > >> > > > Sent: Tuesday, May 26, 2015 11:15 PM
> > > > >> > > > To: solr-user@lucene.apache.org
> > > > >> > > > Subject: Re: docValues: Can we apply synonym
> > > > >> > > >
> > > > >> > > > Yes it could be :)
> > > > >> > > >
> > > > >> > > > Anyway thanks for helping.
> > > > >> > > >
> > > > >> > > > With Regards
> > > > >> > > > Aman Tandon
> > > > >> > > >
> > > > >> > > > On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti
<
> > > > >> > > > benedetti.alex85@gmail.com> wrote:
> > > > >> > > >
> > > > >> > > > > I should investigate that, as usually synonyms
are analysis
> > > > stage.
> > > > >> > > > > A simple way is to replace the word with
all its synonyms (
> > > > >> > > > > including original word), but simply using
this kind of
> > > > processor
> > > > >> > > > > will change the token position and offsets,
modifying the
> > actual
> > > > >> > > > > content of the
> > > > >> > > > document .
> > > > >> > > > >
> > > > >> > > > > " I am from Bombay" will become " I am from
Bombay Mumbai"
> > which
> > > > >> > > > > can be annoying.
> > > > >> > > > > So a clever approach must be investigated.
> > > > >> > > > >
> > > > >> > > > > 2015-05-26 17:36 GMT+01:00 Aman Tandon <
> > amantandon.10@gmail.com
> > > > >:
> > > > >> > > > >
> > > > >> > > > > > Okay So how could I do it with UpdateProcessors?
> > > > >> > > > > >
> > > > >> > > > > > With Regards
> > > > >> > > > > > Aman Tandon
> > > > >> > > > > >
> > > > >> > > > > > On Tue, May 26, 2015 at 10:00 PM, Alessandro
Benedetti <
> > > > >> > > > > > benedetti.alex85@gmail.com> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > mmm this is different !
> > > > >> > > > > > > Without any customisation, right
now you could :
> > > > >> > > > > > > - use docValues to provide exact
value facets.
> > > > >> > > > > > > - Than you can use a copy field,
with the proper
> > analysis,
> > > > to
> > > > >> > > > > > > search
> > > > >> > > > > > when a
> > > > >> > > > > > > user click on a filter !
> > > > >> > > > > > >
> > > > >> > > > > > > So you will see in your facets
:
> > > > >> > > > > > > Mumbai(3)
> > > > >> > > > > > > Bombay(2)
> > > > >> > > > > > >
> > > > >> > > > > > > And when clicking you see 5 results.
> > > > >> > > > > > > A little bit misleading for the
users …
> > > > >> > > > > > >
> > > > >> > > > > > > On the other hand if you you want
to apply the synonyms
> > > > >> > > > > > > before, the indexing pipeline (
because docValues field
> > can
> > > > >> > > > > > > not be analysed), I
> > > > >> > > > > think
> > > > >> > > > > > > you should play with UpdateProcessors.
> > > > >> > > > > > >
> > > > >> > > > > > > Cheers
> > > > >> > > > > > >
> > > > >> > > > > > > 2015-05-26 17:18 GMT+01:00 Aman
Tandon <
> > > > >> amantandon.10@gmail.com
> > > > >> > >:
> > > > >> > > > > > >
> > > > >> > > > > > > > We are interested in using
docValues for better memory
> > > > >> > > > > > > > utilization
> > > > >> > > > > and
> > > > >> > > > > > > > speed.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Currently we are faceting
the search results on
> > *city. *In
> > > > >> > > > > > > > city we
> > > > >> > > > > have
> > > > >> > > > > > > > also added the synonym for
cities like mumbai, bombay
> > > > (These
> > > > >> > > > > > > > are
> > > > >> > > > > Indian
> > > > >> > > > > > > > cities). So that result of
mumbai is also eligible
> > when
> > > > >> > > > > > > > somebody will applying filter
of bombay on search
> > results.
> > > > >> > > > > > > >
> > > > >> > > > > > > > I need this functionality
to apply with docValues
> > enabled
> > > > >> > field.
> > > > >> > > > > > > >
> > > > >> > > > > > > > With Regards
> > > > >> > > > > > > > Aman Tandon
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Tue, May 26, 2015 at 9:19
PM, Alessandro Benedetti
> > <
> > > > >> > > > > > > > benedetti.alex85@gmail.com>
wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > I checked in the Documentation
to be sure, but
> > > > apparently :
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > DocValues are only available
for specific field
> > types.
> > > > The
> > > > >> > > > > > > > > types
> > > > >> > > > > > chosen
> > > > >> > > > > > > > > determine the underlying
Lucene docValue type that
> > will
> > > > be
> > > > >> > > used.
> > > > >> > > > > The
> > > > >> > > > > > > > > available Solr field
types are:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >    - StrField and UUIDField.
> > > > >> > > > > > > > >    - If the field is
single-valued (i.e.,
> > multi-valued
> > > > is
> > > > >> > > > > > > > > false),
> > > > >> > > > > > > Lucene
> > > > >> > > > > > > > >       will use the SORTED
type.
> > > > >> > > > > > > > >       - If the field
is multi-valued, Lucene will
> > use
> > > > the
> > > > >> > > > > SORTED_SET
> > > > >> > > > > > > > type.
> > > > >> > > > > > > > >    - Any Trie* numeric
fields and EnumField.
> > > > >> > > > > > > > >    - If the field is
single-valued (i.e.,
> > multi-valued
> > > > is
> > > > >> > > > > > > > > false),
> > > > >> > > > > > > Lucene
> > > > >> > > > > > > > >       will use the NUMERIC
type.
> > > > >> > > > > > > > >       - If the field
is multi-valued, Lucene will
> > use
> > > > the
> > > > >> > > > > SORTED_SET
> > > > >> > > > > > > > type.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > This means you should
not analyse a field where
> > > > DocValues
> > > > >> > > > > > > > > is
> > > > >> > > > > enabled.
> > > > >> > > > > > > > > Can your explain us your
use case ? Why are you
> > > > interested
> > > > >> > > > > > > > > in
> > > > >> > > > > > synonyms
> > > > >> > > > > > > > > DocValues level ?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Cheers
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 2015-05-26 13:32 GMT+01:00
Upayavira <
> > uv@odoko.co.uk>:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > To my understanding,
docValues are just an
> > uninverted
> > > > >> > index.
> > > > >> > > > > > > > > > That
> > > > >> > > > > > is,
> > > > >> > > > > > > > it
> > > > >> > > > > > > > > > contains the terms
that are generated at the end
> > of an
> > > > >> > > > > > > > > > analysis
> > > > >> > > > > > > chain.
> > > > >> > > > > > > > > > Therefore, you simply
enable docValues and
> > include the
> > > > >> > > > > > > > > > SynonymFilterFactory
in your analysis.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Is that enough,
or are you struggling with some
> > other
> > > > >> > issue?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Upayavira
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Tue, May 26,
2015, at 12:03 PM, Aman Tandon
> > wrote:
> > > > >> > > > > > > > > > > Hi,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > We have some
field *city* in which the
> > docValues are
> > > > >> > > enabled.
> > > > >> > > > > We
> > > > >> > > > > > > need
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > add the synonym
in that field so how could we
> > do it?
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > With Regards
> > > > >> > > > > > > > > > > Aman Tandon
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > --
> > > > >> > > > > > > > > --------------------------
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Benedetti Alessandro
> > > > >> > > > > > > > > Visiting card :
> > http://about.me/alessandro_benedetti
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > "Tyger, tyger burning
bright In the forests of the
> > > > night,
> > > > >> > > > > > > > > What immortal hand or
eye Could frame thy fearful
> > > > >> symmetry?"
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > William Blake - Songs
of Experience -1794 England
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > --
> > > > >> > > > > > > --------------------------
> > > > >> > > > > > >
> > > > >> > > > > > > Benedetti Alessandro
> > > > >> > > > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >> > > > > > >
> > > > >> > > > > > > "Tyger, tyger burning bright
> > > > >> > > > > > > In the forests of the night,
> > > > >> > > > > > > What immortal hand or eye
> > > > >> > > > > > > Could frame thy fearful symmetry?"
> > > > >> > > > > > >
> > > > >> > > > > > > William Blake - Songs of Experience
-1794 England
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > --------------------------
> > > > >> > > > >
> > > > >> > > > > Benedetti Alessandro
> > > > >> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >> > > > >
> > > > >> > > > > "Tyger, tyger burning bright
> > > > >> > > > > In the forests of the night,
> > > > >> > > > > What immortal hand or eye
> > > > >> > > > > Could frame thy fearful symmetry?"
> > > > >> > > > >
> > > > >> > > > > William Blake - Songs of Experience -1794
England
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > ********************************************************************
> > > > >> > > > **
> > > > >> > > > *** This e-mail may contain confidential or privileged
> > > > information.
> > > > >> > > > If you are not the intended recipient, please
notify the
> > sender
> > > > >> > > > immediately and then delete it.
> > > > >> > > >
> > > > >> > > > TIAA-CREF
> > > > >> > > >
> > > > ********************************************************************
> > > > >> > > > **
> > > > >> > > > ***
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > **********************************************************************
> > > > >> > > *** This e-mail may contain confidential or privileged
> > information.
> > > > >> > > If you are not the intended recipient, please notify
the sender
> > > > >> > > immediately and then delete it.
> > > > >> > >
> > > > >> > > TIAA-CREF
> > > > >> > >
> > > > **********************************************************************
> > > > >> > > ***
> > > > >> > >
> > > > >> > >
> > > > **********************************************************************
> > > > >> > > *** This e-mail may contain confidential or privileged
> > information.
> > > > >> > > If you are not the intended recipient, please notify
the sender
> > > > >> > > immediately and then delete it.
> > > > >> > >
> > > > >> > > TIAA-CREF
> > > > >> > >
> > > > **********************************************************************
> > > > >> > > ***
> > > > >> > >
> > > > >> >
> > > > >> >
> > > >
> > *************************************************************************
> > > > >> > This e-mail may contain confidential or privileged information.
> > > > >> > If you are not the intended recipient, please notify the
sender
> > > > >> > immediately and then delete it.
> > > > >> >
> > > > >> > TIAA-CREF
> > > > >> >
> > > >
> > *************************************************************************
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > >
> >

Mime
View raw message