nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Epo Jemba" <taha...@gmail.com>
Subject URL Injection with another source than text files
Date Wed, 04 Jul 2007 10:44:32 GMT
Hello ,

I'm new to nutch and I have a question regarding url injection mechanism.

If I well understood, the source of the actual urls injection mechanism is a
text file.

My wish will be the possibility to change this source type from the actual
text file to another one (database, xml, etc ).

I identified two classes org.apache.nutch.crawl.Injector and
org.apache.nutch.crawl.Crawl that are related to this need.

- What is the better way to modify the actual source to allow url Injector
to read from another source ?
- Do the actual design allow this kind of modification easily (subclass
Injector, etc ...) ?

Thank you for your response

Best Regards

Epo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message