nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Closed: (NUTCH-271) Meta-data per URL/site/section
Date Wed, 19 Jul 2006 18:22:14 GMT
     [ http://issues.apache.org/jira/browse/NUTCH-271?page=all ]

Andrzej Bialecki  closed NUTCH-271.
-----------------------------------

    Resolution: Fixed

I'm closing this issue, because this functionality can be achieved by using a combination
of CrawlDatum.metaData and url/scoring filters.

> Meta-data per URL/site/section
> ------------------------------
>
>                 Key: NUTCH-271
>                 URL: http://issues.apache.org/jira/browse/NUTCH-271
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.7.2
>            Reporter: Stefan Neufeind
>
> We have the need to index sites and attach additional meta-data-tags to them. Afaik this
is not yet possible, or is there a "workaround" I don't see? What I think of is using meta-tags
per start-url, only indexing content below that URL, and have the ability to limit searches
upon those meta-tags. E.g.
> http://www.example1.com/something1/   -> meta-tag "companybranch1"
> http://www.example2.com/something2/   -> meta-tag "companybranch2"
> http://www.example3.com/something3/   -> meta-tag "companybranch1"
> http://www.example4.com/something4/   -> meta-tag "companybranch3"
> search for everything in companybranch1 or across 1 and 3 or similar

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message