lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <>
Subject Re: Reduction based "more like this"?
Date Fri, 09 Feb 2007 10:13:46 GMT
> For example, given terms "female", "John" and "London" - all 3 may
> have equal IDF but should a document representing a female in London
> be given equal weighting to a document representing the rarer example
> of a female who happens to be called "John"?

Not to mention multi-word phrase tokenization, like the difference
between a document which contains the text

  "...should not be allowed to possess a lethal weapon like a..."

and a document which contains the phrase

  "...should not be allowed to see Lethal Weapon until at least the age of..."

In the first case, tokenization of "lethal weapon" should take place, while
in the second case, we need to preserve the the phrase as a single term.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message