nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Binns <aa...@archive.org>
Subject Language plugin tokenizers in Indexer?
Date Thu, 18 Jun 2009 21:28:10 GMT

I've been working on bringing the NutchWAX project in line with the
Nutch 1.0 release.

One of the Nutch 1.0 features I'm interested in using is the language
analysis plugin so that I can start playing with tokenizers for Chinese,
Japanese, etc.

After looking at Indexer.java and SolrIndexer.java, I couldn't see how
the language plugins are used.  I did see their use in the "new" scoring
and indexing stuff: FieldIndexer.java and related classes.

Is the use of the language-specific tokenizer plugins only used by the
new FieldIndexer system?  Or is it also used by the traditional Lucene
indexer and I just overlooked it?


Thanks!

Aaron

-- 
Aaron Binns
Senior Software Engineer, Web Group
Internet Archive
aaron@archive.org

Mime
View raw message