lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Woolly Mammoth <>
Subject lucene webcrawler/dbms indexing framework
Date Thu, 01 Apr 2004 22:59:33 GMT
Hi All,
	I have seen some discussion in the past around LARM & other web
crawler indexing code, but not much output. I have started a project on
SF, and have commited some
initial framework code to CVS (despite the front page saying there are
not commits...), I haven't done a release yet, mainly because I need to
check licencing & am also having some trouble getting PDFBox to get all
fields in docs. If anyone has time to help/review would be great. I
wanted to try & licence as Apache style for contributers & gpl for
others, anyone know about this ?

The real goal of this is an easy to deploy lucene implementation, but
also scalable & flexible for customisation.
I will be putting all the currently hardcoded indexing rules into
config files asap.. - then hopefully getting a mgmt interface over the
files & indexing process


Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message