nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sami Siren <>
Subject Re: [jira] Commented: (NUTCH-496) ConcurrentModificationException can be thrown when getSorted() is called.
Date Mon, 04 Jun 2007 18:38:02 GMT
Briggs wrote:
> Yeah, you are correct there.  How does this thing actually even
> remotely begin to work on a  predictable level?

One crucial aspect of language identification is that the input properly
encoded. There was a patch that added icu4j character set encoding
detection into Nutch. I believe icu4j also offers language
identification in addition to character set detection. Has anyone
checked how usable the language identification from icu4j would be?

There is severe problems with current language identification for CJK
for example.

 Sami Siren

View raw message