lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <>
Subject Re: Proposal for Lucene
Date Sun, 24 Feb 2002 16:42:08 GMT
On Fri, 2002-02-08 at 05:26, Manfred Schäfer wrote:
> Hi,
> i would suggest two sub-projects:

I think "packages" would be more appropriate of a description, I
wouldn't call them "subprojects" so to speak.

> 1.Crawler - retrieving docs, wherever they are.....
> 2. DocumentHandler extract Text, create apropriate fields etc..

+1 thats what I was getting at in the proposal about DocumentFactory

> The second is a layer on top of lucene. First is a autonomous package, wich
> should be nicely integrated with lucene/Document-Handler, but should also be
> usable for other projects.

hummm...I'm not entirely sure I'd go that far.  Well encapsulated for
sure but How usable by other projects is up to them not us...

> I've included my code, to show you, what i've done. It isn't too useful yet,
> because it is integrated in our product, but you can get the idea. Actually i've
> written two things:
> 1: A robot for crawling a remote server via http and writing all the data to
> local filesystem, then importing it into our db and
> (at the same time) replacing all links with internal links. So we could emulate
> a web-Site from this crawled Data!
> [com.synformation.script.utilities.importtool]

I looked through this!  Great stuff!  Do you own this code?  Are you
able to donate it to Lucene (APL and all)?  It looks like a great
starting point.  We'd have to do some refactoring but it looks pretty
dern good to me.  I haven't tried running it, just skimmed through.

> 2: (I've rewritten some of the code from 1 for that, so this is much cleaner) A
> customer needs a tool for importing local mini-Websites on the file-system via
> an applet, send it to the Web-Server and import it as described in point 1. I've
> tried to write it in a way, that it could include the functionality of point 1
> (retrieving vie http), but that is mostly untested.
> [com.synformation.script.utilities.fileimport]
My brain didn't parse that..

> I don't say, that you(we) should use this. But i think it's time to come to a
> more concrete plans. I'm interested to help on that for the crawler.

If you're able to donate it (legally) I kinda think there is a lot
here.  It of course needs to be refactored to meet some of the
objectives we've outlined, but a darn good starting point IMHO!

> mfg,
> manfred
> ----

> --
> To unsubscribe, e-mail:   <>
> For additional commands, e-mail: <>
-- - port of Excel/Word/OLE 2 Compound Document 
                            format to java 
			- fix java generics!
The avalanche has already started. It is too late for the pebbles to
-Ambassador Kosh

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message