lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6993) Update TLDs to latest list
Date Thu, 18 Feb 2016 00:01:18 GMT


Robert Muir commented on LUCENE-6993:

Basically the old versions of the Tokenizer and Impl are just "saved" to a subdirectory, and
in the Analyzer and TokenizerFactory we conditionally use them, if you request that compatibility

Have a look at branch_5x which still has {{std40}} containing StandardTokenizer40, StandardTokenizerImpl40,
UAX29URLEmailTokenizer40, and so on. TestStandardAnalyzer and TestUAX29URLEmailAnalyzer also
have a testBackcompat40 which calls {{setVersion}} and ensures it works. Finally, see StandardAnalyzer/,
and UAXURLEmailAnalyzer/ which conditionally use StandardTokenizer40
depending on version.

So we should do a similar thing with the current stuff in master before modifying the files,
and make them {{std55}}. We can just test that it works at all (e.g. foo bar -> foo,bar)
initially and later maybe add a test ensuring "old behavior" stays the same.

Then you can bump unicode version and tld lists and it won't change any behavior if someone
asks for version < 6.0, because they will get the exact same tokenizer as before.

> Update TLDs to latest list
> --------------------------
>                 Key: LUCENE-6993
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Mike Drob
>            Assignee: Robert Muir
>             Fix For: 6.0
>         Attachments: LUCENE-6993.patch, LUCENE-6993.patch
> We did this once before in LUCENE-5357, but it might be time to update the list of TLDs
again. Comparing our old list with a new list indicates 800+ new domains, so it would be nice
to include them.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message