lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <>
Subject Re: Payloads
Date Thu, 18 Jan 2007 16:38:38 GMT
Nadav Har'El wrote:
> On Thu, Jan 18, 2007, Michael Busch wrote about "Re: Payloads":
>> As you pointed out it is still possible to have per-doc payloads. You 
>> need an analyzer which adds just one Token with payload to a specific 
>> field for each doc. I understand that this code would be quite ugly on 
>> the app side. A more elegant solution might be LUCENE-580. With that 
>> patch you are able to add pre-analyzed fields (i. e. TokenStreams) to a 
>> Document without having to use an analyzer. You could use a TokenStream 
> Thanks, this sounds like a good idea.
> In fact, I could live with something even simpler: I want to be able
> to create a Field with a single token (with its payload). If I need more
> than one of these tokens with payloads, I can just add several fields with
> the same name (this should work, although the description of LUCENE-580
> suggests that it might have a bug in this area).
> I'll add a comment about this use-case to LUCENE-580.
Yes for your use case it would indeed make sense to just add a single 
Token to a field. But there are other use cases that would benefit from 
580. E. g. when using UIMA as a parser. UIMA does not work per-field, it 
materializes the tokens of all fields in a CAS. So the indexer can't 
call the parser per field, the parsing has to be done before indexing. 
So it would make sense to do the parsing and then add TokenStreams for 
the different fields to the Document that only iterate through the CAS.
This is of course also possible by adding multiple Field instances 
containing single Tokens to a Document, but the performance would 
suffer. Each Token would be wrapped in a Field object and then hold in a 
list in Document.

So I think being able to add TokenStreams to a Document makes sense.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message