lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Stoppelman" <stop...@gmail.com>
Subject Re: Synonyms and Ranking
Date Thu, 03 Jan 2008 21:55:53 GMT
Hi all,

Would this approach be recommended for stemmed words as well. For example
let say the original word is
'mower', I want matches on 'mow', 'mowing' and 'mowers' but the most
relevance would obviously be matches
for 'mower'. Should I index my documents unstemmed and then stem at the
query words with a lower
weighting?

-M

On Dec 28, 2007 10:39 AM, Grant Ingersoll <gsingers@apache.org> wrote:

> Yes, the Payload stuff should work for this, but you will have to set
> it up during indexing.  The simpler approach is probably a separate
> field for synonyms, but this means analyzing the same content twice
> (or trying out the TeeTokenFilter, but this is advanced usage at this
> point, since it is unreleased.)
>
> There is no support for payloads in the query parser, so you would
> have to construct the queries on your own or at least add in queries
> based on the clauses that the QueryParser outputs.
>
>
>
> On Dec 28, 2007, at 10:54 AM, Frank Schima wrote:
>
> >
> > Hi Grant,
> >
> >
> >
> > Grant Ingersoll-6 wrote:
> >>
> >> You can use the payload functionality (have a look at
> >> BoostingTermQuery and Michael B. excellent ApacheCon talk at
> >> http://people.apache.org/~buschmi/apachecon/<http://people.apache.org/%7Ebuschmi/apachecon/>).
>  Other option is to
> >> put
> >> the synonyms into a separate field and boost that less than the main
> >> field.
> >>
> >> On Dec 27, 2007, at 4:19 PM, Frank Schima wrote:
> >>
> >>> So I have my fancy new stemmed synonym based Lucene index. Let's say
> >>> I have
> >>> the following synonym defined:
> >>>
> >>> radiation -> radiotherapy (and the reverse)
> >>>
> >>> The search results rank all results exactly the same. Is there a way
> >>> to
> >>> Boost the actual search term a little higher than the synonym(s)?
> >>
> >
> > To be clear, if someone searches for "radiation" I want content
> > exactly with
> > "radiation" to rank higher than content with "radiotherapy". But if
> > someone
> > searches for "radiotherapy", I want content with that to rank higher
> > than
> > content with "radiation". Will Payloads do this for me?
> >
> > I would try it but I'm having trouble figuring out how to do the
> > search. For
> > the search, I'm currently using a MultiFieldQueryParser, so like this:
> >
> >    SnowballAnalyzer sba = new SnowballAnalyzer("English",
> > StopAnalyzer.ENGLISH_STOP_WORDS);
> >    QueryParser qp = new MultiFieldQueryParser( new String[] {"field1",
> > "field2", "field3"}, sba );
> >    try {
> >      Query query = qp.parse(strSearchTerms);
> >    } catch ( Throwable th) {
> >       ...
> >    }
> >
> > However, the payload example in the presentation requires a
> > BoostingTermQuery, like this:
> >
> >    Query query = new BoostingTermQuery( new Term("field",
> > "searchterm"));
> >
> > Is there a way to make the two work together?
> >
> >
> > Thanks!
> > Frank
> >
> >
> > --
> > View this message in context:
> http://www.nabble.com/Synonyms-and-Ranking-tp14518753p14527508.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message