lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6993) Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers to support Unicode 8.0
Date Fri, 26 Feb 2016 18:31:18 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169481#comment-15169481
] 

Uwe Schindler commented on LUCENE-6993:
---------------------------------------

bq. Uwe Schindler has written that he still recommends this tokenizer in some cases, so if
you're asking if we should remove it, I don't think so.

I think the question was if it should also be upgraded to newer Unicode. But it does not rely
on any unicode version the JAVA files should be identical. Please don't remove it!

> Update UAX29URLEmailTokenizer TLDs to latest list, and upgrade all JFlex-based tokenizers
to support Unicode 8.0
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6993
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6993
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mike Drob
>            Assignee: Robert Muir
>             Fix For: 6.0
>
>         Attachments: LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch, LUCENE-6993.patch,
LUCENE-6993.patch
>
>
> We did this once before in LUCENE-5357, but it might be time to update the list of TLDs
again. Comparing our old list with a new list indicates 800+ new domains, so it would be nice
to include them.
> Also the JFlex tokenizer grammars should be upgraded to support Unicode 8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message