nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series
Date Tue, 04 Mar 2014 09:52:23 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919198#comment-13919198
] 

Anton  edited comment on NUTCH-1478 at 3/4/14 9:51 AM:
-------------------------------------------------------

Yes, nutch with patch v5 works fine without error.
Thanks!!!

in $SOLR_HOME$/conf/schema.xml I use such field names different from current wiki suggestion
http://wiki.apache.org/nutch/IndexMetatags

{code:xml}
     <!-- fields for metatags -->
     <field name="meta_description" type="string" stored="true" indexed="true"/>
     <field name="meta_keywords" type="string" stored="true" indexed="true"/> 
{code}
 






was (Author: popalka):
Yes, nutch with patch v5 works fine without error.
Thanks!!!

in $SOLR_HOME$/conf/schema.xml I use such field_names. These are different from current wiki
suggestion http://wiki.apache.org/nutch/IndexMetatags

{code:xml}
     <!-- fields for metatags -->
     <field name="meta_description" type="string" stored="true" indexed="true"/>
     <field name="meta_keywords" type="string" stored="true" indexed="true"/> 
{code}
 





> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>
>                 Key: NUTCH-1478
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1478
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, NUTCH-1478v4.patch,
NUTCH-1478v5.patch, Nutch1478.patch, Nutch1478.zip, metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  This will
take multiple values of same tag and index in Solr as i patched before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here (http://wiki.apache.org/nutch/IndexMetatags) but
one change is that there is no need to give 'metatag' keyword before metatag names. For example
my configuration looks like this (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)

> This is only the first version and does not include the junit test. I will update the
new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the fields
in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message