commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aldrin Leal (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LANG-285) Wish : method unaccent
Date Sun, 15 Oct 2006 07:29:36 GMT
    [ http://issues.apache.org/jira/browse/LANG-285?page=comments#action_12442372 ] 
            
Aldrin Leal commented on LANG-285:
----------------------------------

A while ago, I did something on ISO8859-1, but methinks UTF-8 could handle it as well.

Sorry about the comments being in Brazilian Portuguese. Overrall, it works! :)

	/**
	 * Reescreve a string, removendo acentos, de forma otimizar a busca.
	 * 
	 * @param str
	 *            String a ser normalizada
	 * @return A string sem os acentos
	 */
	public static String normalizar(String str) {
		String retval = null;
		char[] chArr = normalize0(str);
		String[] xlatTab = new String[] { "áâãà".toUpperCase(),
				"a".toUpperCase(), "éêè".toUpperCase(), "e".toUpperCase(),
				"íîì".toUpperCase(), "i".toUpperCase(), "óôòõ".toUpperCase(),
				"o".toUpperCase(), "úûù".toUpperCase(), "u".toUpperCase(),
				"ç".toUpperCase(), "c".toUpperCase(), "áâãà", "a", "éêè", "e",
				"íîì", "i", "óôòõ", "o", "úûù", "u", "ç", "c", };

		for (int k = 0; k < chArr.length; k++)
			for (int i0 = 0; i0 < xlatTab.length; i0 += 2)
				if (-1 != (xlatTab[i0].indexOf(chArr[k])))
					chArr[k] = xlatTab[(i0 + 1)].charAt(0);

		retval = new String(chArr);

		log.debug("data0=" + str + "; data=" + retval);

		return retval;
	}


> Wish : method unaccent
> ----------------------
>
>                 Key: LANG-285
>                 URL: http://issues.apache.org/jira/browse/LANG-285
>             Project: Commons Lang
>          Issue Type: New Feature
>            Reporter: Guillaume Coté
>            Priority: Minor
>
> I would like to add a method that replace accented caracter by unaccented one.  For example,
with the input String "L'été où j'ai dû aller à l'île d'Anticosti commenca tôt", the
method would return "L'ete ou j'ai du aller à l'ile d'Anticosti commenca tot".
> I suggest to call that method unaccent and to add it in StringUtils.
> If we cannot covert all case, the first version could only covert iso-8859-1.
> If you are willing to go forward with that idea, I am willing to contribute a patch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message