nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Agethle (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping
Date Mon, 25 Oct 2010 08:13:18 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924491#action_12924491
] 

Matthias Agethle commented on NUTCH-923:
----------------------------------------

Perhaps something like Solr DIH could be a solution. Adding scriptable transformers would
allow to write custom logic and would be much more flexible. This way one could also add default
field values if no value is provided etc. 
E.g.
{code:xml}
<script><![CDATA[
                function addLanguage(row)        {
                     //Implementation
                }
        ]]></script>
<fields transformer="script:addLanguage" >
    <field dest="lang" source="lang"/>
    <field dest="title" source="title"/>
</fields>
{code}

In the addLanguage script one could do all kind of validations to restrict explosion of field-names.

> Multilingual support for Solr-index-mapping
> -------------------------------------------
>
>                 Key: NUTCH-923
>                 URL: https://issues.apache.org/jira/browse/NUTCH-923
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.2
>            Reporter: Matthias Agethle
>            Assignee: Markus Jelsma
>            Priority: Minor
>
> It would be useful to extend the mapping-possibilites when indexing to solr.
> One useful feature would be to use the detected language of the html page (for example
via the language-identifier plugin) and send the content to corresponding language-aware solr-fields.
> The mapping file could be as follows:
> <field dest="lang" source="lang"/>
> <field dest="title_${lang}" source="title" />
> so that the title-field gets mapped to title_en for English-pages and tilte_fr for French
pages.
> What do you think? Could this be useful also to others?
> Or are there already other solutions out there?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message