lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Spans, appended fields, and term positions
Date Sun, 20 Nov 2005 20:03:40 GMT
One more thing to  consider: the field length in the index.
Probably the added position increment between appended parts
of a field should not be reflected in the total field size as indexed.

This would also be a consideration for queries and for the
field norms: when multiple fields are used they may all have their
own length (norm) and a span query will never match between the fields.
When appending the same field to a document multiple times as parts
of the same field, the total field length should should be used in
the query score and the span query may match between the
field parts  when the slop is larger than the position increment
between the parts.

> Highlighting is quite a challenging endeavor!  Spans certainly
> provides a lot of help, but in the appended field scenario, the
> Spans.start() and .end() goes across the field boundary, so it
> requires, in my case with the text coming from stored field values,
> cleverness in how to highlight in order to keep field instances
> separate.

The problem is that it is not really possible to reconstruct the appended
parts of the field from the index, especially when stop words leave gaps
so there may be a large gap because of stop words.
So there is only one option: never query a with span slop that is bigger
that the position increment between the appended parts.

But even then the highlighting may not be reliable. It depends on 
Document.fields() of a stored and retrieved document: does it return
all the appended field parts as separate Fields, or does it only
return one Field with all parts appended? I don't know.
Highlighting will not be reliable in the latter case.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message