nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shanaka Jayasundera (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series
Date Tue, 18 Mar 2014 17:59:44 GMT


Shanaka Jayasundera commented on NUTCH-1478:

Hi All,

I've downloaded latest code from 2.x branch and try to index meta data to Solr but Solr query
results are not showing meta data.  

But , parsechecker working fine . Do I need to do any additional configurations to get meta
data on solr query results.

$ ./bin/nutch parsechecker
contentType: text/html
signature: b2bb805dcd51f12784190d58d619f0bc
meta_forrest-version : 	0.10-dev
meta_generator : 	Apache Forrest
meta_forrest-skin-name : 	nutch_rs_ : �
meta_content-type : 	text/html; charset=UTF-8

Command I'm using to crawl and Index is ,
bin/crawl urls/seed.txt TestCrawl3.1 http://localhost:8983/solr/ 2

I've not done much configuration changes,  I've configure nutch-sites.xml  and
to use hbase & gora

Appreciate if anyone can help me to identify the missing configurations.
Thanks in advance.

> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>                 Key: NUTCH-1478
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, NUTCH-1478v4.patch,
NUTCH-1478v5.1.patch, NUTCH-1478v5.patch, NUTCH-1478v6.patch, Nutch1478.patch,,
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  This will
take multiple values of same tag and index in Solr as i patched before (
> The usage is same as described here ( but
one change is that there is no need to give 'metatag' keyword before metatag names. For example
my configuration looks like this (

> This is only the first version and does not include the junit test. I will update the
new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the fields
in '' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.

This message was sent by Atlassian JIRA

View raw message