lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Term Based Meta Data
Date Tue, 05 Aug 2008 16:56:20 GMT

I think you could use payloads (= arbitrary/opaque byte[]) for this?

You can attach a payload to each term occurrence during tokenization  
(indexing), and then retrieve the payload during searching.


Martin Owens wrote:

> Hello Users,
> I'm working on a project which attempts to store data that comes  
> from an
> OCR process which describes the pixel co-ordinates of each term in the
> document. It's used for hit highlighting.
> What I would like to do is store this co-ordinate information  
> alongside
> the terms. I know there is existing meta data stored per term (Word
> Offset and Char Offsets) the problem is that If I create a separate
> index and try and use the word offset or char offsets not only is it
> slower but it doesn't match because of the way the terms are processed
> both inside of lucene and the OCR program.
> So, is it possible to store the data alongside the terms in lucene and
> then recall them when doing certain searches? and how much custom code
> needs to be written to do it?
> Best Regards, Martin Owens
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message