nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiran chitturi <>
Subject Nutch 2.x architecture Supporting multivalues
Date Wed, 10 Oct 2012 19:41:52 GMT

I am working on porting parse-metatags plugin to Nutch 2.x series. I did
work on patches on the same plugin for Nutch 1.5 so that multivalued tags
are saved in an array and then sent to Solr. It all worked good in 1.5.

I have ported the plugin to Nutch 2.x now but it works only for a single
value of the tag. It does not work for multivalues of a tag.

I had problem working with the Nutch architecture and the api, since some
functions do not accept multivalues like 'add function in NutchDocument'.
It has accepted 'object' type as second argument in 1.5 version but only
accepts string type in 2.x versions.

I have tried changing the metadata type to 'Map<utf8, List<ByteBuffer>>' in
WebPage and all other functions which used it. It has worked but also
failed at some points. So i am not sure if its the best way to proceed.

Can someone point to me whats the best way to do this ?

I want value of the metadata key to accept multivalues, so we should be
storing it as an array type. NutchDocument.add should accept array type in
the second parameter to pass the index values as an array.

I am also interested in knowing the opinion of nutch developers regarding
these changes.

Many Thanks,

Kiran Chitturi

View raw message