lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <>
Subject Re: Reading Payloads
Date Tue, 23 Apr 2013 14:41:44 GMT
Am 23.04.2013 16:17, schrieb Alan Woodward:

> It doesn't sound as though an inverted index is really what you want to be querying here,
if I'm reading you right.  You want to get the payloads for spans at a specific position,
but you don't particularly care about the actual term at that position?  You might find that
BinaryDocValues are a better fit here, but it's difficult to tell without knowing what your
actual use case is.

Hi Alan,
you are right that this specific aspect is not really suitable for an
inverted index. I've still been hoping that I could misuse it for some
cases. Let me sketch my use case:
A user performs a query that is parsed and executed in the form of a
SpanQuery. The offsets of the match(es) are extracted and returned. From
that point on, the user uses these offsets to retrieve certain segments
of a document from an external database.
However, I also store additional information (linguistic annotations) in
the token payloads because they are also used for more complex queries
that filter matches depending on these payloads. As they are stored in
the index anyway, I thought I could as well extract them upon request. I
am aware that such a request wouldn't perform very well, but apart from
that, I think it would be very handy if I were able to extract the
payloads for a given span.
However, I can't find a way other than via TokenSources.getTokenStream;
but that doesn't work apparently.
I'm now thinking about storing the resulting Spans in memory so that I
could extract the payloads upon user request. However, that still
wouldn't allow me to extract the payloads of any other token which would
be a typical use case when a user wants to retrieve annotations for
adjacent tokens, for example.

Institut für Deutsche Sprache |
Projekt KorAP                 |
Tel. +49-(0)621-43740789      |
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message