nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yogendra Kumar Soni (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2079) Tika Parsing plugin issue
Date Wed, 12 Aug 2015 11:54:46 GMT


Yogendra Kumar Soni commented on NUTCH-2079:

You can add information you have parsed in parsemeta metadata.  You need to add your plugin
in  parse-plugins.xml if you are writing your own parser plugin.
after that you need to write index plugin to index the field you are adding. and finally add
your plugins in nutch-site.xml. 

to summarize you need to :
1. change parse-plugin-> getParse(Content ) to add new <key,value> in  Metadata object.
2. change index-plugin -> to add your new field into Nutch Document or Webpage. 
3. change gora-mongodb-mapping to add new field.


> Tika Parsing plugin issue
> -------------------------
>                 Key: NUTCH-2079
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: deployment
>    Affects Versions: 2.3
>         Environment: Ubuntu 14.04
>            Reporter: Pradumna Panditrao
>             Fix For: 2.3
> Hi,
> I am trying to parse particular data & post the same on the mongodb, however when
I am trying to do some modifications into into parse tika plugin, it has too much inter connectivity
with other classes & it misses the data. I want to pick up particular data from website
using the same plugin & put into mongo db.
> Please suggest for the same.

This message was sent by Atlassian JIRA

View raw message