lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: AW: inter-term correlation [was Re: Vector Space Model in Lucene?]
Date Mon, 17 Nov 2003 19:56:10 GMT
Karsten Konrad wrote:
> I was wondering whether we could, while indexing, make a use of this by 
> increasing the position counter by a large number, let's say 1000, 
> whenever we encounter a sentence separator (Note, this is not trivial; 
> not every '.' ends a  sentence etc. etc. etc.). Thus, searching for
> 
> "income tax"~100 "tax gain"~100 "income tax gain"~100 income tax gain
> 
> would find "income tax gain" as usual, but would boost all texts
> where the phrases involved appear within sentence boundaries

This is exactly the sort of approach I was advocating in earlier 
messages.  (Although I think you'd only need to increase the position 
counter by 101 for the first word in each sentence.)  Herb Chong didn't 
seem to think this was appropriate, but I never understood why.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message