jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (OAK-4575) Oak 1.0.x fulltext search with ideographic space (U+3000) as separator
Date Thu, 21 Jul 2016 13:02:20 GMT

     [ https://issues.apache.org/jira/browse/OAK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas Mueller resolved OAK-4575.
---------------------------------
    Resolution: Fixed

> Oak 1.0.x fulltext search with ideographic space (U+3000) as separator
> ----------------------------------------------------------------------
>
>                 Key: OAK-4575
>                 URL: https://issues.apache.org/jira/browse/OAK-4575
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>    Affects Versions: 1.0.32
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>             Fix For: 1.0.33
>
>
> In Oak 1.0, the Lucene index uses its own tokenizer. That tokenizer doesn't support ideographic
space (U+3000) as word separator.
> In Oak 1.2 and later, the Lucene tokenizer is used, which works as expected.
> Backporting all relevant changed from Oak 1.2 to the 1.0 branch would be a lot of changes,
and the risk of regression would be high (too high in my view). An alternative is to add support
for the ideographic space in the query engine (replace it with a regular space character).
Please note the behavior is still not exactly the same as with Oak 1.2, but as for this exact
use case it is expected to work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message