lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From quinton olivier <>
Subject Scoring refinement question
Date Thu, 07 Apr 2005 10:13:03 GMT

I don't know if this question has already  been asked
as I couldn't find any clue on Mail Archive. 
I would like to kown if there is a proper way to
refine the scoring of a fuzzy query in such a way :
taking in account only the best match for a given
position, and not to sum scores for all mathing tokens
regarding the same position (i.e. for different

In other words, if more than one synonym match a term
of a fuzzy search query, i would like to collect only
the score of the best synonym, but not to sum scores
for all matching synonyms.

Maybe a short example should help to understand my
question : 

I have products identified by labels. A label have one
or more tokens.  A token can have none, few or a lot
of synonyms. 

Doing a fuzzy query on a particular label with a wrong
spelling should bring back the most relevant ones and
not the ones with the most of associated synonyms. 
We give a 0-increment position to synonyms while
indexing and don't injected them in the fuzzy query as
we don't know them at this time.

Looking at  the source code, I think the best way is
to check which term (i.e. which position) match with
the searched one when collecting results but
HitCollector.collect() method only receive a docId and
a score. Internally it sums scores to any previous
matched ones. This method doesn't have access to term
text, or position and can't compare with an internal
memory containing the current most relevant synonym
for the term at a given position.

If any of you have some ideas or suggestions that can
help me, you are welcome :)

Thanks a lot for your help.



Découvrez le nouveau Yahoo! Mail : 250 Mo d'espace de stockage pour vos mails ! 
Créez votre Yahoo! Mail sur

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message