nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series
Date Tue, 04 Feb 2014 11:50:13 GMT


Anton  commented on NUTCH-1478:

Hi [~lewismc] snipped of source code with NPE below.
I added comment to mark line 95 with NPE

    // add the fields from contentmd
    if (contentFieldnames != null) {
      for (String metatag : contentFieldnames) {
        // String[] value = parse.getData().getContentMeta().getValues(metatag);
        ByteBuffer bvalues = page.getFromMetadata(new Utf8(metatag));
        String value = new String(bvalues.array());                       //line 95 with NPE
        if (value != null)
          doc.add("meta_" + metatag, value);


Hi [~talat] Do you mean that I need to define another field name in schema.xml?
I have such field definition now:
 <field name="metatag.description" type="string" stored="true" indexed="true"/>

It have the same name as in wiki
and another type of field ('string'), not the same as in wiki ('text')

> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>                 Key: NUTCH-1478
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, NUTCH-1478v4.patch,
Nutch1478.patch,, metadata_parseChecker_sites.png
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  This will
take multiple values of same tag and index in Solr as i patched before (
> The usage is same as described here ( but
one change is that there is no need to give 'metatag' keyword before metatag names. For example
my configuration looks like this (

> This is only the first version and does not include the junit test. I will update the
new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the fields
in '' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.

This message was sent by Atlassian JIRA

View raw message