lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francisco Sanmartin <>
Subject Re: "Similarity" of numbers in MoreLikeThisHandler
Date Fri, 04 Jul 2008 20:59:53 GMT
The problem is the concept of "similarity". Your concept of similarity 
is based on the meaning of the numbers (or the words). Solr's concept of 
similarity is based on subsets of characters. This way for Solr 
"thunder" is similar to "thunderstorm" or to "under" because there are 
sets of characters containing each other.

Understood this, here it comes a possible solution for your problem. 
First of all, let X and Y be numbers, define what condition has to be 
for X be similar to Y. For example, we will use X is similar to Y if and 
only if  |X-Y| = 1.

Then we create a new field called "similarity", and everytime we commit, 
we will fill this field with the "similar values". If we want to commit 
a value of "2" in the field "value", we will fill the field "similarity" 
with the values "1,3". And then everytime you search for a value, you 
will search in the field "value" AND in the field "similarity". This way 
if you search for "value:1 OR similarity:1", the document containing 
"similarity:1" will show up, and as you can guess, this document is the 
one containing "value:2". This happens because when you commited the 
document with "value:2", you also defined its similars, which are the 
documents whose values are "1 or 3".

I hope this helps you·


wojtekpia wrote:
> I stored 2 copies of a single field: one as a number, the other as a string.
> The MLT handler returned the same documents regardless of which of the 2
> fields I used for similarity. So to answer my own question, the
> MoreLikeThisHandler does not do numeric comparisons on numeric fields.

View raw message