[ https://issues.apache.org/jira/browse/NUTCH-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-898.
-------------------------------
Resolution: Won't Fix
The old (only) nightly build i was using did allow multiple values but concatenated them.
The current branch-1.2 already stored the values a multi valued field.
It was already fixed!
> Multi valued subcollection is not multi valued
> ----------------------------------------------
>
> Key: NUTCH-898
> URL: https://issues.apache.org/jira/browse/NUTCH-898
> Project: Nutch
> Issue Type: Bug
> Components: indexer
> Environment: nutch-2010-07-07_04-49-04
> Reporter: Markus Jelsma
> Fix For: 1.2
>
>
> NUTCH-716 concatenates multiple values in a single string instead of adding single values
to a multi valued field. For a test crawl i have defined the following two subcollection definitions:
> <subcollection>
> <name>asdf</name>
> <id>asdf-site</id>
> <whitelist>http://asdf/</whitelist>
> <blacklist/>
> </subcollection>
> <subcollection>
> <name>news</name>
> <id>asdf-news</id>
> <whitelist>http://asdf/news/</whitelist>
> <blacklist/>
> </subcollection>
> Reindexing the segments by sending them to Solr will yield the following results for
a news URL:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> </arr>
> <str name="url">http://asdf/home/</str>
> </doc>
> <doc>
> <arr name="subcollection">
> <str>asdf news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> Instead, i expected the following result for the second document:
> <doc>
> <arr name="subcollection">
> <str>asdf</str>
> <str>news</str>
> </arr>
> <str name="url">http://asdf/news/</str>
> </doc>
> My Solr schema.xml has the following declaration for the subcollection field:
> <field name="subcollection" type="string" stored="true" indexed="true" multiValued="true"
/>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|