lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <sar...@syr.edu>
Subject Re: [jira] Resolved: (LUCENE-478) CJK char list
Date Sun, 13 Aug 2006 13:36:06 GMT
Otis Gospodnetic (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/LUCENE-478?page=all ]
> 
> Otis Gospodnetic resolved LUCENE-478.
> -------------------------------------
> 
>     Resolution: Fixed
> 
> Thanks, I committed Steven Rowe's patch, although it doesn't seem to
> fully match what he said in comments above (e.g. in his patch, I
> don't see the range he mentioned in 5.b).

Hi Otis,

Here's 5.b.:

5. Character ranges in John's list that are missing in
StandardTokenizer.jj, and that should be added to the newly
re-labeled <CJ> section:

   5.b. [ U+3d2e - U+4DB5 ] (non-chars [ U+4DB6 - U+4DBF ] excluded)
        CJK Ideograph Extension A.
        This range was introduced in Unicode 3.0.

And here's the corresponding change from the patch:

        "\u3300"-"\u337f",
-       "\u3400"-"\u3d2d",
+       "\u3400"-"\u4db5",
        "\u4e00"-"\u9fff",

I don't understand - it looks to me like the above change adds the range
mentioned in 5.b.

Are there other inconsistencies?  (You said that 5.b. was an example.)

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message