lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier
Date Sun, 17 Dec 2017 13:21:00 GMT


Michael McCandless commented on LUCENE-5012:

This issue should make it easier to fix the bug you're seeing, but we can also fix the bug
(in {{ShingleFilter}} I'm guessing?) before doing this more ambitious change.

It sounds like {{ShingleFilter}} is not looking at {{PositionLengthAttribute}}?

> Make graph-based TokenFilters easier
> ------------------------------------
>                 Key: LUCENE-5012
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5012.patch, LUCENE-5012.patch
> SynonymFilter has two limitations today:
>   * It cannot create positions, so eg dns -> domain name service
>     creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
>     others).
>   * It cannot consume a graph, so e.g. if you try to apply synonyms
>     after Kuromoji tokenizer I'm not sure what will happen.
> I've thought about how to fix these issues but it's really quite
> difficult with the current PosInc/PosLen graph representation, so I'd
> like to explore an alternative approach.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message