lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chong, Herb" <>
Subject RE: Exotic format indexing?
Date Thu, 30 Oct 2003 20:05:33 GMT
Word documents with FastSave enabled contain the original document and then deltas to the document
until the deltas exceed a certain size and then they are merged back into the document. that
means that unless you run the deltas, you won't know what the actual final contents are.


-----Original Message-----
From: Ben Litchfield []
Sent: Thursday, October 30, 2003 2:49 PM
To: Lucene Users List
Subject: Re: Exotic format indexing?

Unfortunately, it is not quite so easy.  I am not sure about Word
documents but PDFs usually have there contents compressed so a raw
"fishing" around for text would be pointless.  Your best bet is to use a
package like the one from that handles various formats for


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message