From nutch-dev-return-2898-apmail-lucene-nutch-dev-archive=lucene.apache.org@lucene.apache.org Thu Dec 01 21:04:39 2005 Return-Path: Delivered-To: apmail-lucene-nutch-dev-archive@www.apache.org Received: (qmail 76341 invoked from network); 1 Dec 2005 21:04:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Dec 2005 21:04:38 -0000 Received: (qmail 71330 invoked by uid 500); 1 Dec 2005 21:04:36 -0000 Delivered-To: apmail-lucene-nutch-dev-archive@lucene.apache.org Received: (qmail 71313 invoked by uid 500); 1 Dec 2005 21:04:36 -0000 Mailing-List: contact nutch-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: nutch-dev@lucene.apache.org Delivered-To: mailing list nutch-dev@lucene.apache.org Received: (qmail 71300 invoked by uid 99); 1 Dec 2005 21:04:35 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2005 13:04:35 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of jerome.charron@gmail.com designates 66.249.82.204 as permitted sender) Received: from [66.249.82.204] (HELO xproxy.gmail.com) (66.249.82.204) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2005 13:06:03 -0800 Received: by xproxy.gmail.com with SMTP id i29so200876wxd for ; Thu, 01 Dec 2005 13:04:13 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=okTKEDdo2nbKfQTOHWwuKibFPW/s5a+SQ+9cVDjMohmR2SXfUrSIOumFIwcLxtA4Ci50uwf5ViPLh7XIEQW8bSYgbmepB6jAIynGj93BTcQ8P84QeRL/4K9qBA8kl/6m1eS3ylGtJMuNoYO/3HPlaHx1ritaNzRNJZjqDdpvtcc= Received: by 10.70.109.19 with SMTP id h19mr2460099wxc; Thu, 01 Dec 2005 13:04:13 -0800 (PST) Received: by 10.70.111.4 with HTTP; Thu, 1 Dec 2005 13:04:13 -0800 (PST) Message-ID: Date: Thu, 1 Dec 2005 22:04:13 +0100 From: =?ISO-8859-1?Q?J=E9r=F4me_Charron?= To: nutch-dev@lucene.apache.org Subject: Re: [Nutch-dev] incremental crawling In-Reply-To: <438F62BA.8030800@nutch.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_4091_7808825.1133471053177" References: <438F4BE5.5050603@nutch.org> <438F62BA.8030800@nutch.org> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_4091_7808825.1133471053177 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Sounds really good (and it is requested by a lot of nutch users!). +1 J=E9r=F4me On 12/1/05, Doug Cutting wrote: > > Matt Kangas wrote: > > #2 should be a pluggable/hookable parameter. "high-scoring" sounds lik= e > > a reasonable default basis for choosing recrawl intervals, but I'm sur= e > > that nearly everyone will think of a way to improve upon that for thei= r > > particular system. > > > > e.g. "high-scoring" ain't gonna cut it for my needs. (0.5 wink ;) > > In NUTCH-61, Andrzej has a pluggable FetchSchedule. That looks like a > good idea. > > http://issues.apache.org/jira/browse/NUTCH-61 > > Doug > -- http://motrech.free.fr/ http://www.frutch.org/ ------=_Part_4091_7808825.1133471053177--