nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabrice EstiƩvenart <>
Subject getting a list of matching URLs from a start URL
Date Tue, 08 Mar 2005 12:47:04 GMT

 From a list of start URLs (each associated with a regular expression), 
I'd like to get - for each start URL - all URLs that come from the same 
domain and that match the expression...I don't wanna analyse or index 
the URLs, just to write them down in a flat file.

Example :
start URL :
regular expresssion : /files/*.html

gives :
- ...

How can I do that simply with Nutch without "reinventing the wheel" ? 
Should I extend an existing class ? develop a plugin ? Could you give me 
some tips please ?

Thanks a lot for this useful forum !!!


View raw message