lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karel Tejnora <ka...@tejnora.cz>
Subject Analyzers, perfect hash, ICU
Date Wed, 11 Jan 2006 16:50:58 GMT
Hi all,
    I'm working on the analyzer for the slovanic latin languages (cs,sk) 
w/o stemming at first.
I would like to ask you:
The StopWord analyzer uses often HashSet implementation, but the the 
Stopwords are not changed often (if ever) from shipped in the java code. 
Do you think that is there benefit for the perfect hash algorithm?
I will do an ICU analyzer for latin chars (decompositing and return base 
char). Have you any exp. with icu(.sf.net) some problems, bottlenecks?

Thx,
Karel

P. S.: also I would like these stuff contribute to lucene-contrib if 
it'll be recognized useful. Is there any  howto  set the Eclipse for 
Lucene/Apache related project?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message