lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-2102) LowerCaseFilter for Turkish language
Date Tue, 01 Dec 2009 22:55:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784496#action_12784496
] 

Uwe Schindler edited comment on LUCENE-2102 at 12/1/09 10:53 PM:
-----------------------------------------------------------------

Robert: I understand your problem, but it affects LowerCaseFilter at all and is not special
to the Turkish lower filter. If you have decomposed characters even LowerCaseFilter would
fail for *all* languages (even German if you compose ä out of a and two dots). In germany
really nobody uses composed chars, I do not know how this is in Turkey, but the last time
I was there, they just used the simpliest composed chars (like germans), they even have the
umlauts which they use from the basic latin1 range. And for that this filter works and is
a quick fix.

But I give up now.

      was (Author: thetaphi):
    Robert: I understand your problem, but it affects LowerCaseFilter at all and is not special
to the Turkish lower filter. If you have decomposed characters even LowerCaseFilter would
fail for *all* languages (even German if you compose ä out of a and two dots). In germany
really nobody uses composed chars, I do not lknow how this is in Turkey, but the last time
I was there, they just used the simpliest composed chars (like germans). And for that this
filter works and is a quick fix.

But I give up now.
  
> LowerCaseFilter for Turkish language
> ------------------------------------
>
>                 Key: LUCENE-2102
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2102
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>    Affects Versions: 3.0
>            Reporter: Ahmet Arslan
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2102.patch, LUCENE-2102.patch, LUCENE-2102.patch
>
>
> java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish alphabet lowercase
of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message