nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jérôme Charron" <>
Subject Re: Status of language plugin
Date Wed, 07 Jun 2006 08:58:08 GMT
> Is there an API doc or design doc that I can read to
> understand where you are? Is the language plugin architecture
> already in the main trunk?

The only available document is
and sometimes I maintain this page

> Here are some issues that I've been worried about:
> * Support of multilingual plugin?
> ** If one plugin can support more than one languages,
>    the language needs to be passed at each analyzsis.

I don't understand your need.
But if you have an analysis plugin that can handle many languages, you
can simply define many implementations in your plugin xml, ie

<extension id="org.apache.nutch.analysis.cjk"

      <implementation id=""
                      class="org.apache.nutch.analysis.cjk.CJKAnalyzer ">
        <parameter name="lang" value="cn"/>

      <implementation id=""
        <parameter name="lang" value="kr"/>

      <implementation id=""
        <parameter name="lang" value="jp"/>


> ** This assumes language identification is done before
>    analysis.  Is it the case ?


> * Support of a different analyzer for query than index
> ** Analyzer for query may need to behave differently than
>    analyzer for indexinging.  Can your architecture
>    specify different analyzers for indexing and query?

In fact, to avoid adding a QueryAnalyser extension point,
the Query use the same Analyzer implementation that the one
for document analysis.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message