lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: [jira] Resolved: (LUCENE-478) CJK char list
Date Sun, 13 Aug 2006 13:36:06 GMT
Otis Gospodnetic (JIRA) wrote:
>      [ ]
> Otis Gospodnetic resolved LUCENE-478.
> -------------------------------------
>     Resolution: Fixed
> Thanks, I committed Steven Rowe's patch, although it doesn't seem to
> fully match what he said in comments above (e.g. in his patch, I
> don't see the range he mentioned in 5.b).

Hi Otis,

Here's 5.b.:

5. Character ranges in John's list that are missing in
StandardTokenizer.jj, and that should be added to the newly
re-labeled <CJ> section:

   5.b. [ U+3d2e - U+4DB5 ] (non-chars [ U+4DB6 - U+4DBF ] excluded)
        CJK Ideograph Extension A.
        This range was introduced in Unicode 3.0.

And here's the corresponding change from the patch:

-       "\u3400"-"\u3d2d",
+       "\u3400"-"\u4db5",

I don't understand - it looks to me like the above change adds the range
mentioned in 5.b.

Are there other inconsistencies?  (You said that 5.b. was an example.)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message