lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2102) LowerCaseFilter for Turkish language
Date Tue, 01 Dec 2009 21:05:20 GMT


Robert Muir commented on LUCENE-2102:

bq. Maybe its just me, but I think it is critical to normalize the input to Lucene for both
indexing and searching. Unless a NFCNormalizingFilter is added to Lucene, I think it is the
responsibility of the caller.

yeah I think its critical too.

bq. It might be good to note the NFC (NFKC?) requirement in the JavaDoc. 

yeah or maybe just a hint in the comments (because this is an exceptionally tricky case).

this same problem also applies to ASCIIFoldingFilter, pretty much all of the analyzers, etc

> LowerCaseFilter for Turkish language
> ------------------------------------
>                 Key: LUCENE-2102
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: LUCENE-2102.patch
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish alphabet lowercase
of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message