nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sami Siren <ssi...@gmail.com>
Subject Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section
Date Wed, 19 Jul 2006 20:12:59 GMT
0.8 has subcollection plugin. It can add subollection id for set of urls 
and then you can limit searching to subcollections. Is that what you're 
after?

--
 Sami Siren

Stefan Neufeind (JIRA) wrote:

>    [ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_12422226 ] 
>            
>Stefan Neufeind commented on NUTCH-271:
>---------------------------------------
>
>Does somebody have an existing demo-plugin for that, that would catch URL-prefixes from
a file and in case matches are found certain tags are then added? I don't yet fully get it
how to do it "the elegant way" :-)
>
>  
>
>>Meta-data per URL/site/section
>>------------------------------
>>
>>                Key: NUTCH-271
>>                URL: http://issues.apache.org/jira/browse/NUTCH-271
>>            Project: Nutch
>>         Issue Type: New Feature
>>   Affects Versions: 0.7.2
>>           Reporter: Stefan Neufeind
>>
>>We have the need to index sites and attach additional meta-data-tags to them. Afaik
this is not yet possible, or is there a "workaround" I don't see? What I think of is using
meta-tags per start-url, only indexing content below that URL, and have the ability to limit
searches upon those meta-tags. E.g.
>>http://www.example1.com/something1/   -> meta-tag "companybranch1"
>>http://www.example2.com/something2/   -> meta-tag "companybranch2"
>>http://www.example3.com/something3/   -> meta-tag "companybranch1"
>>http://www.example4.com/something4/   -> meta-tag "companybranch3"
>>search for everything in companybranch1 or across 1 and 3 or similar
>>    
>>
>
>  
>


Mime
View raw message