lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Umesh Prasad <umesh.i...@gmail.com>
Subject Re: Solr gives the same fieldnorm for two different-size fields
Date Sun, 03 Aug 2014 01:22:50 GMT
What you really need is a covering type  match. I feel your use case fits
into this type

Score (Exact match in order) >   Score ( Exact match without order ) >
Score (Non Exact Match)

Example  Query : a b c

Example docs :
  d1 :  a b c
  d2 :  a c b
  d3 :  c a b
  d4 : a b c d
  d5 : a b c d e

Use case 1 : Only exact match is a match. (So only d1 is a match)
Use case 2 : Only in order are matches. So d2, d3 aren't matches. Scores
are d1 > d4 > d5
Use case 3 : Only in order are matches. And only one extra term is allowed.
So d2, d3, d5  aren't matches. Scores are d1 > d4
Use case 4 : All are matches and d1 > d2 > d3 > d4 > d5

All of these use cases can be satisfied by using SpanQueries, which tracks
the positions at which terms matches. For covering match, you will need to
introduce add start and end sentinel terms during indexing.

There is an excellent post by Mark Miller about span queries
http://searchhub.org/2009/07/18/the-spanquery/
 Solr's SurroundQuery Parser allows you to create SpanQueries
http://wiki.apache.org/solr/SurroundQueryParser
Or you can plug your own query parser into solr to do the same.

Some more links you can get here ..
http://search-lucene.com/?q=span+queries&fc_project=Lucene&fc_project=Solr



On 1 August 2014 00:24, Erick Erickson <erickerickson@gmail.com> wrote:

> You can consider, say, a copyField directive and copy the field into a
> string type (or perhaps keyworTokenizer followed by lowerCaseFilter) and
> then match or boost on an exact match rather than trying to make scoring
> fill this role.
>
> In any case, I'm thinking of normalizing the sensitive fields and indexing
> them as a single token (i.e. the string type or keywordtokenizer) to
> disambiguate these cases.
>
> Because otherwise I fear you'll get one situation to work, then fail on the
> next case. In your example, you're trying to use length normalization to
> influence scoring to get the doc with the shorter field to sort above the
> doc with the longer field. But what are you going to do when your target is
> "university of california berkley research"? Rely on matching all the
> terms? And so on...
>
> Best,
> Erick
>
>
> On Thu, Jul 31, 2014 at 10:26 AM, gorjida <ali@sciencescape.net> wrote:
>
> > Thanks so much for your reply... In my case, it really matters because I
> am
> > going to find the correct institution match for an affiliation string...
> > For
> > example, if an author belongs to the "university of Toronto", his/her
> > affiliation should be normalized against the solr... In this case,
> > "University of California Berkley Research" is a different place to
> > "university of california berkeley"... I see top-matches are tied in the
> > score for this specific example... I can break the tie using other
> > techniques... However, I am keen to see if this is a common problem in
> > solr?
> >
> > Regards,
> >
> > Ali
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>



-- 
---
Thanks & Regards
Umesh Prasad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message