nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Lorenzo <jay.lore...@gmail.com>
Subject Re: how to reuse webDB with new urls
Date Wed, 14 Sep 2005 16:49:46 GMT
What about the issue of maintaining some semblance of ACIDity? Don't you 
have to make sure that the generation of fetchlists and the updates are run 
synchronously, ie one update or generate at a time?

On 9/13/05, Michael Ji <fji_00@yahoo.com> wrote:
> 
> I think this scenario will work.
> 
> Just a bit worry about the filter performance if the
> domain site number is in scale of thundreds of
> thousands.
> 
> Michael Ji
> 
> --- AJ Chen <canovaj@gmail.com> wrote:
> 
> > Once I create a webDB, can I inject new root urls to
> > the same webDB
> > repeatly? After each injection, run as many cycles
> > of
> > generate/fetch/updatedb to fetch all web pages from
> > the new sites. I think
> > this will allow me to gradually build a
> > comprehensive vertical site. Any
> > comment or suggestion?
> > -AJ
> >
> 
> 
> 
> 
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message