lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-1350) SnowballFilter resets the payload
Date Wed, 06 Aug 2008 20:06:46 GMT


Doron Cohen commented on LUCENE-1350:

The non-reuse interface is deprecated. LUCENE-1333 deals with cleaning that up and applying
reuse in all of Lucene. To date, it was partially applied to core. This results in sub-optimal
performance with Filter chains that use both reuse and non-reuse inputs and filters.

Non-reuse TokenStream API is not deprecated in the trunk. I guess you mean it will be deprecated
by LUCENE-1333.

To me, it is not clearcut what a producer or a consumer actually is. Obviously, input streams
are producers. Some filters, generate multiple tokens as a replacement for the current one
(e.g. NGram, stemming,...). To me, these are producers.

Right, such filters function as producers.  Javadocs should say something weaker, like "most
filters are consumers" or "filters are usually consumers". 

I don't know why the following pattern was not originally used (some filters do this) or why
you didn't migrate to this:
Token token =;
String newTerm = ....;
return token;

This would be faster than cloning and would preserve all fields.

Good point, thanks. 

So I wonder what's next with this issue. The complete LUCENE-1333 is dated for 2.4. So it
seems in place to fix filters behavior now, to preserve payload (and flags, thanks for pointing
this out), following the above (reuse) code pattern. Makes sense?

> SnowballFilter resets the payload
> ---------------------------------
>                 Key: LUCENE-1350
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates
the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message