nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (NUTCH-1024) Dynamically set fetchInterval by MIME-type
Date Fri, 02 Mar 2012 09:05:59 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13220787#comment-13220787
] 

Markus Jelsma edited comment on NUTCH-1024 at 3/2/12 9:05 AM:
--------------------------------------------------------------

New patch for trunk! This also includes a change to the injector where injected fetchInterval
is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected interval overrides anything
else. This is useful for sites where you want to use AdaptiveFetchSchedule but still want
the generator to select an injected homepage every N hours.
                
      was (Author: markus17):
    New patch for trunk! This also includes a change to the injector where injected fetchInterval
is added to CrawlDatum MD. In AdaptiveFetchSchedule this injected interval overrides anything
else.
                  
> Dynamically set fetchInterval by MIME-type
> ------------------------------------------
>
>                 Key: NUTCH-1024
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1024
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: AdaptiveFetchSchedule.patch, MimeAdaptiveFetchSchedule.java, NUTCH-1024-1.5-1.patch,
Nutch.patch, adaptive-mimetypes.txt
>
>
> Add facility to configure default or fixed fetchInterval values by MIME-type. This is
useful for conserving resources for files that are known to change frequently or never and
everything in between.
> * simple key\tvalue\n configuration file
> * only set fetchInterval for new documents
> * keep max fetchInterval fixed by current config

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message