lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <>
Subject [jira] Updated: (LUCENE-1775) Change org.apache.lucene.analysis.shingle to use new TokenStream API
Date Sun, 02 Aug 2009 08:56:14 GMT


Michael Busch updated LUCENE-1775:

    Attachment: lucene-1775.patch

ShingleMatrixFilter is a very complicated filter. It seems that it is implemented in a very
inefficient way, it does lots of cloning. While I was able to fully convert ShingleFilter
in a way, so that it is now much more efficient now, I'm not going to do that with the ShingleMatrixFilter.
I don't know the code well enough to even try and with 1000 LOC it's very complex.

The drawback of not fully converting it is that if someone uses custom Attributes, i. e. ones
that are not in core Lucene, it is undefined what the filter will do with those Attributes.
However, I don't even know what the behavior should be. If only core Attributes are used,
everything is working fine, as the passing junits show.

I added a corresponding comment to the javadocs of that class.

> Change org.apache.lucene.analysis.shingle to use new TokenStream API
> --------------------------------------------------------------------
>                 Key: LUCENE-1775
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: lucene-1775.patch, lucene-1775.patch, lucene-1775.patch
> All other contrib streams/filters have already been converted with LUCENE-1460.
> The two shingle filters are the last ones we need to convert.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message