phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4237) Allow sorting on (Java) collation keys for non-English locales
Date Thu, 05 Oct 2017 06:01:27 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192500#comment-16192500
] 

ASF GitHub Bot commented on PHOENIX-4237:
-----------------------------------------

GitHub user shehzaadn opened a pull request:

    https://github.com/apache/phoenix/pull/275

    PHOENIX-4237: Add function to calculate Java collation keys

    Here we implement a generalized solution for calculating Java collation keys by creating
Java collators based on a user locale. These collation keys can then be used in an ORDER BY
clause to sort strings in a natural-language-appropriate way. We add a new Phoenix function
COLLKEY. In general usage for this function will be:
    
    select name from my_table order by COLLKEY(name, 'zh_TW')
    
    We use artifacts from the ICU4J project and recently open-sourced grammaticus project
(by Maven dependency). We were forced to include some code from ICU4J because some jars produced
by that project aren't published in Maven. We also include code from Salesforce that has been
licensed for open-source release but not yet published as artifacts in maven.
    
    There are three commits that split the changes into three logical pieces:
    
    1) f8cb121: Add the external source code described above
    2) fdbb5e0: Make changes needed to the Phoenix license due to the above (and fix to what
seems to be an existing bug) 
    3) 98cfc10: The actual function implementation of COLLKEY - new code that uses the code
introduced above and newly introduced dependencies via maven.
    
    Thanks in advance to the Phoenix community for your feedback on this.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shehzaadn/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #275
    
----
commit f8cb121145163591345eea70acbc313098e23e21
Author: Shehzaad <snakhoda@salesforce.com>
Date:   2017-09-30T01:52:46Z

    (1) add ICU4J source code for charset/localespi jars and (2) add Salesforce i18n-util
source code

commit fdbb5e009a767e0f6df385dc9a1a8472b32cc361
Author: Shehzaad <snakhoda@salesforce.com>
Date:   2017-10-02T17:55:39Z

    (1) Fix text of 3-clause BSD License, (2) add Unicode license, (3) add mention of bundling
ICU4J and i18n-util code

commit 98cfc10bac3c48ec3e7ceb47bea0b60556265c85
Author: Shehzaad <snakhoda@salesforce.com>
Date:   2017-10-02T21:58:31Z

    add function COLLKEY to Phoenix to calculate a Java collation key on a given string with
the collator derived from an ISO locale code and some other parameters

----


> Allow sorting on (Java) collation keys for non-English locales
> --------------------------------------------------------------
>
>                 Key: PHOENIX-4237
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4237
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Shehzaad Nakhoda
>
> Strings stored via Phoenix can be composed from a subset of the entire set of Unicode
characters. The natural sort order for strings for different languages often differs from
the order dictated by the binary representation of the characters of these strings. Java provides
the idea of a Collator which given an input string and a (language) locale can generate a
Collation Key which can then be used to compare strings in that natural order.
> Salesforce has recently open-sourced grammaticus. IBM has open-sourced ICU4J some time
ago. These technologies can be combined to provide a robust new Phoenix function that can
be used in an ORDER BY clause to sort strings according to the user's locale.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message