lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Timm <tim...@aol.com>
Subject Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?
Date Sun, 06 May 2007 01:08:06 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
It may not be easy or even possible without major changes, but having
global collection statistics would allow scores to be compared across
searchers.&nbsp; To do this, the master indexes would need to be able to
communicate with each other.<br>
<br>
An other approach to merging across searchers is described here:<br>
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Greg Pass, Ophir
Frieder, <a   href="http://ir.iit.edu/%7Eabdur/publications/p19-beitzel.pdf">
"Surrogate Scoring for Improved Metasearch Precision"</a> , Proceedings
of the 2005 ACM Conference on Research and Development in Information
Retrieval (SIGIR-2005), Salvador, Brazil, August 2005.<br>
<br>
-Sean<br>
<br>
<a class="moz-txt-link-abbreviated" href="mailto:deinspanjer@gmail.com">deinspanjer@gmail.com</a>
wrote:
<blockquote   cite="midf81fa4180705050718lebb9119t964003b7a8f2be3f@mail.gmail.com"   type="cite">On
4/11/07, Chris Hostetter
<a class="moz-txt-link-rfc2396E" href="mailto:hossman_lucene@fucit.org">&lt;hossman_lucene@fucit.org&gt;</a>
wrote:
  <br>
  <blockquote type="cite"><br>
    <br>
A custom Similaity class with simplified tf, idf, and queryNorm
functions
    <br>
might also help you get scores from the Explain method that are more
    <br>
easily manageable since you'll have predictible query structures hard
    <br>
coded into your application.
    <br>
    <br>
ie: run the large query once, get the results back, and for each result
    <br>
look at the explanation and pull out the individual pieces of hte
    <br>
explanation and compare them with those of hte other matches to create
    <br>
your own "normalization".
    <br>
  </blockquote>
  <br>
  <br>
Chuck Williams mentioned a proposal he had for normalization of scores
that
  <br>
would give a constant score range that would allow comparison of
scores.
  <br>
Chuck, did you ever write any code to that end or was it just
algorithmic
  <br>
discussion?
  <br>
  <br>
Here is the point I'm at now:
  <br>
  <br>
I have my matching engine working.&nbsp; The fields to be indexed and the
queries
  <br>
are defined by the user.&nbsp; Hoss, I'm not sure how that affects your idea
of
  <br>
having a custom Similarity class since you mentioned that having
predictable
  <br>
query structures was important...
  <br>
The user kicks off an indexing then defines the queries they want to
try
  <br>
matching with.&nbsp; Here is an example of the query fragments I'm working
with
  <br>
right now:
  <br>
year_str:"${Year}"^2 year_str:[${Year -1} TO ${Year +1}]
  <br>
title_title_mv:"${Title}"^10 title_title_mv:${Title}^2
  <br>
+(title_title_mv:"${Title}"~^5 title_title_mv:${Title}~)
  <br>
director_name_mv:"${Director}"~2^10 director_name_mv:${Director}^5
  <br>
director_name_mv:${Director}~.7
  <br>
  <br>
For each item in the source feed, the variables are interpolated (the
query
  <br>
term is transformed into a grouped term if there are multiple values
for a
  <br>
variable). That query is then made to find the overall best match.
  <br>
I then determine the relevance for each query fragment.&nbsp; I haven't
written
  <br>
any plugins for Lucene yet, so my current method of determining the
  <br>
relevance is by running each query fragment by itself then iterating
through
  <br>
the results looking to see if the overall best match is in this result
set.
  <br>
If it is, I record the rank and multiply that rank (e.g. 5 out of 10)
by a
  <br>
configured fragment weight.
  <br>
  <br>
Since the scores aren't normalized, I have no good way of determining a
poor
  <br>
overall match from a really high quality one. The overall item could be
the
  <br>
first item returned in each of the query fragments.
  <br>
  <br>
Any help here would be very appreciated. Ideally, I'm hoping that maybe
  <br>
Chuck has a patch or plugin that I could use to normalize my scores
such
  <br>
that I could let the user do a matching run, look at the results and
  <br>
determine what score threshold to set for subsequent runs.
  <br>
  <br>
Thanks,
  <br>
Daniel
  <br>
  <br>
</blockquote>
</body>
</html>

Mime
View raw message