nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From discoversk <salim.k...@focusinfomatics.com>
Subject nutch parsers
Date Thu, 13 Nov 2008 04:58:46 GMT

Hello,

1.    How parsers are parsing or extracting urls from documents like
html/doc/pdf?

2.     If we got 1000 urls at depth 0, and we have given topN 100; in this
case what is algorithm nutch is using to select 100 urls out of 1000 ??



Thanks.
Salim
-- 
View this message in context: http://www.nabble.com/nutch-parsers-tp20474841p20474841.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Mime
View raw message