lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "spamsucks" <>
Subject Write my own crawler VS use nutch?
Date Fri, 26 Jan 2007 21:01:22 GMT
I am successfully using lucene in our application to index 12 different 
types of objects located in a database, and their relationships to each 
other to provide some nice search functionality for our website.  We are 
building lots of lucene queries programmatically to filter based upon 
categories, regions, zip codes, scoring, long/lats...

My problem is that there is content that is not in the database which we 
have a lot of... (about 3000+ pages) that we need to also include in the 
search results.  It's a whole lot of jsp's.

As I see this, I can either
a) Migrate this application to nutch
b) Write a web crawler to crawl our site and inject the crawl results into 
our lucene index.

I am leaning towards option B (write our own crawler), since I think it 
would only take me a couple of days of write a simple crawler and I wouldn't 
have to change much else.

Can anyone think of any points/counterpoints for using Nutch vs. writing a 
crawler to extend our already used lucene framework?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message