nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gal Nitzan" <gnit...@usa.net>
Subject RE: 'RegexIndexingFilter'
Date Mon, 29 Jan 2007 21:18:54 GMT
Hi,

Since the plug-in you are about to write is actually a filter in a chain of
filters all you need to do is throw an exception in the filter interface
like so: throw new IndexingException("Doesn't comply to regex blah. Do not
index");

HTH

Gal.

-----Original Message-----
From: Tobias Zahn [mailto:Tobias-Zahn@arcor.de] 
Sent: Monday, January 29, 2007 8:58 PM
To: nutch-dev@lucene.apache.org
Subject: 'RegexIndexingFilter'

Good evening!
I have found out that it is impossible to index only some specific file
types with nutch. Needing this feature, I thought of implementing an
'RegexIndexingFilter', if that would be the right thing to do so.
I have read some sourcecode, but I couldn't find out how to tell the
indexer that he shouldn't index a file.

Hoping that I am on the right way I hope for your opinions, ideas and
your help.

TIA,
Tobias Zahn



Mime
View raw message