lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nadav Har'El" <>
Subject Re: Scoring
Date Thu, 15 Jun 2006 10:03:12 GMT
One interesting thing to talk about is when you need to create a new Query
subclass, and how to do it.

For example, let's say you want something between a BooleanQuery and a
PhraseQuery, which matches documents with some of the query words in them
(like the normal BooleanQuery), but giving more score to documents which
contain these words near each other (there was a discussion about this idea
about a month ago, when we discussed short documents).

In that case, what do I need to do? I supposed I need to write a new Query
subclass, but what does doing this take? Do I need to write a "Scorer"? A
"Similarity"? Or what?

I think this is an interesting topic.

Nadav Har'El

             Grant Ingersoll                                               
             >                                                          To 
             15/06/2006 03:01                                           cc 
                                       Re: Scoring                         
             Please respond to                                             


This is a great start.  I have also started a scoring.xml document under
the xdocs directory (in my sandbox).  So far, I have the following
sections (some even have content under them!):
1. Introduction  // Intro about Vector Space Model, some references to
theory, links to the Similarity scoring Formula
2. Scoring and the Index   //How scoring relates to what is in the index
(i.e. how it takes advantage of precomputed info such as norms, etc.
3. Understanding Similarity  //How the Similarity class fits into
Scoring and what it means to override the Similarity (Greek Kung Fu!)
4. Changing Your Scoring -- Expert // A discussion of
overriding/creating Scorer/Query/Whatever else
5. Class Diagrams // Links to your cool pictures
6. Sequence Diagrams //More cool pictures

What else is needed/useful?  Anyone want to volunteer on a section?

karl wettin wrote:
> On Wed, 2006-06-07 at 08:27 -0400, Grant Ingersoll wrote:
>> I have started something in my sandbox that goes in the xdocs directory
>> that is going to cover the scoring and how it works (something parallel
>> in spirit to the file formats documentation).  Adding in sequence
>> diagrams and whatever you have would be a perfect fit.  I would be happy

>> to coordinate with you, as you may end up getting to it before me.
>> I would also like to see, possibly, some package level documentation and

>> more javadocs.
> Day (night) one of me getting to know the finding and scoring of the
> documents matching a query ended up with an initial class diagram.
> <
> <>
> Feel free to let me know what I got wrong.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
Voice:  315-443-5484
Fax: 315-443-6886

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message