lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Tue, 24 Feb 2004 23:06:55 GMT
This is provided by the Token.startOffset() and Token.endOffset() at indexing time, I think.

I don't know if this is accessible at run time.  A good place to see what is stored in the
files is the File Formats section located at
 (Get the latest from HEAD to see the new Term Vector stuff).  For what you can access, I
usually start at IndexReader and dig in from there.

Of course, the Position info and how we did it is available in the "first" patch I submitted
(and the "original one" from Dmitry), so if you are willing to always write position information,
you could update your code with  that information.  Or, better yet :-), take it and add the
necessary touches to make it truly optional and donate it back to Lucene.


>>> 02/24/04 05:39PM >>>
Grant Ingersoll wrote:

> It is the location of the token in the document (see IndexReader.termPositions()).  
> This information is already being stored in other parts of the index, it just isn't very
efficient to get at it.  

Ok, that wasn't the answer I was hoping for :) I was hoping that the positions referred to
was the 
start/end offsets in the originating Token(s). I'll just have to find another way to optimize
highlighting code to make it more efficient.


Bruce Ritchie

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message