nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] Closed: (NUTCH-898) Multi valued subcollection is not multi valued
Date Tue, 07 Sep 2010 11:16:32 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Jelsma closed NUTCH-898.
-------------------------------

    Resolution: Won't Fix

The old (only) nightly build i was using did allow multiple values but concatenated them.
The current branch-1.2 already stored the values a multi valued field.

It was already fixed! 

> Multi valued subcollection is not multi valued
> ----------------------------------------------
>
>                 Key: NUTCH-898
>                 URL: https://issues.apache.org/jira/browse/NUTCH-898
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>         Environment: nutch-2010-07-07_04-49-04
>            Reporter: Markus Jelsma
>             Fix For: 1.2
>
>
> NUTCH-716 concatenates multiple values in a single string instead of adding single values
to a multi valued field. For a test crawl i have defined the following two subcollection definitions:
> <subcollection>
> <name>asdf</name>
> <id>asdf-site</id>
> <whitelist>http://asdf/</whitelist>
> <blacklist/>
> </subcollection>
> <subcollection>
> <name>news</name>
> <id>asdf-news</id>
> <whitelist>http://asdf/news/</whitelist>
> <blacklist/>
> </subcollection>
> Reindexing the segments by sending them to Solr will yield the following results for
a news URL:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> </arr>
> <str name="url">http://asdf/home/</str>
> </doc>
> <doc>
> <arr name="subcollection">
> <str>asdf news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> Instead, i expected the following result for the second document:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> <str>news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> My Solr schema.xml has the following declaration for the subcollection field:
> <field name="subcollection" type="string" stored="true" indexed="true" multiValued="true"
/>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message