lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: Ferret's changes
Date Wed, 11 Oct 2006 09:31:37 GMT

David Balmain wrote on 10/10/2006 08:53 PM:
> On 10/11/06, Chuck Williams <> wrote:
> I personally would always store term vectors since I use a
> StandardTokenizer and Stemming. In this case highlighting matches in
> small documents is not trivial. Ferret's highlighter matches even
> sloppy phrase queries and phrases with gaps between the terms
> correctly. I couldn't do this without the use of term vectors.

I use stemming as well, but am not yet matching phrases like that. 
Perhaps term vectors will be useful to achieve this, although they come
at a high cost and it doesn't seem difficult or expensive to do the
matching directly on the text of small items.

>> I suppose it would be possible for the single conceptual field 'body' to
>> be represented with two physical fields 'smallBody' and 'largeBody'
>> where the former stores term vectors and the latter does not.
> If I really wanted to solve this problem I would use this solution. It
> is pretty easy to search multiple fields when I need to. Ferret's
> Query language even supports it:
>    smallBody|largeBody:"phrase to search for"

Couldn't agree more.  I have a number of extensions to Lucene's query
parser, including this for multiple fields:

{smallBody largeBody}:"phrase to search for"

> In the end, I think the benifits of my model far outweight the costs.
> For me at least anyway.

Based on the performance figures so far, it seems they do!  I think
dynamic term vectors have a substantial benefit, but can easily be
implemented in model where all field indexing properties are fixed.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message