lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <>
Subject [jira] [Updated] (LUCENE-8352) Make TokenStreamComponents final
Date Tue, 18 Sep 2018 09:52:00 GMT


Alan Woodward updated LUCENE-8352:
    Attachment: LUCENE-8352.patch

> Make TokenStreamComponents final
> --------------------------------
>                 Key: LUCENE-8352
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mark Harwood
>            Priority: Minor
>         Attachments: LUCENE-8352.patch
> The current design is a little trappy. Any specialised subclasses of TokenStreamComponents
_(see_ _StandardAnalyzer, ClassicAnalyzer, UAX29URLEmailAnalyzer)_ are discarded by any
subsequent Analyzers that wrap them _(see LimitTokenCountAnalyzer, QueryAutoStopWordAnalyzer,
ShingleAnalyzerWrapper and other examples in elasticsearch)_. 
> The current design means each AnalyzerWrapper.wrapComponents() implementation discards
any custom TokenStreamComponents and replaces it with one of its own choosing (a vanilla TokenStreamComponents
class from examples I've seen).
> This is a trap I fell into when writing a custom TokenStreamComponents with a custom
setReader() and I wondered why it was not being triggered when wrapped by other analyzers.
> If AnalyzerWrapper is designed to encourage composition it's arguably a mistake to also permit
custom TokenStreamComponent subclasses  - the composition process does not preserve the choice
of custom classes and any behaviours they might add. For this reason we should not encourage
extensions to TokenStreamComponents (or if TSC extensions are required we should somehow mark
an Analyzer as "unwrappable" to prevent lossy compositions).

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message