lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Indexing help needed
Date Fri, 25 May 2007 20:42:08 GMT
jim shirreffs wrote:
> Thanks for the advice, I just don't see where in the Lucene code I 
> should plug OOParcer into Lucene.
> I've walked the code in LIUS and Nutch (moving on to Solr) trying to 
> find common objects. If I can find common objects in Lucene and Nutch 
> I'll know where to plug in.

You seem to be somewhat confused about what Lucene really is. It's just 
a library, and not an application. It's up to you to provide the logic 
and glue, or to extend any existing demo application to accomodate your 
needs. It's also a _plain_ _text_ search library. So if you want to 
index anything else you need to first convert it to a plain text format.

That's essentially what OOParser does in Nutch. It extracts data from OO 
documents and converts it to plain text. Disregard other stuff in that 
plugin - it has to do with how Nutch passes this data to storage (and 
indexing takes place in a completely different step, so you won't find 
it here). Just use the parts that extract plain text data - and then use 
this plain text data to add fields to Lucene documents.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message