lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: AW: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Mon, 17 Nov 2003 21:25:16 GMT
Ah, I see.  You have an absoulte interpretation.  I am more relative.  I 
think we're talking about a heuristic, not a law.

Matches within a sentence are scored higher than those that are not. 
And the closer matching the terms are, whether within the same sentence 
or not, the greater the score.  Given these two principals, at some 
point, as sentences get longer, a close match across sentence boundaries 
should probably score substantially higher than a very distant match 
within a sentence.  Thus missing some distant yet still within-sentence 
matches in very long sentences probably won't substantially alter the 
ranking.  Is 100 long enough?  Perhaps not.  But 1000 is certainly 
plenty long.

Doug

Chong, Herb wrote:
> any arbitrary number you pick will be broken by some document someone puts into the system.
> 
> Herb....
> 
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Monday, November 17, 2003 2:56 PM
> To: Lucene Users List
> Subject: Re: AW: inter-term correlation [was Re: Vector Space Model in Lucene?]
> 
> This is exactly the sort of approach I was advocating in earlier 
> messages.  (Although I think you'd only need to increase the position 
> counter by 101 for the first word in each sentence.)  Herb Chong didn't 
> seem to think this was appropriate, but I never understood why.
> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message