lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Email and attachments
Date Fri, 13 Oct 2006 07:47:00 GMT

I am a newbie with Lucene and I am working out the best way to index email data.

An earlier poster talked about index attachments with two alternatives: 
However, there is a third alternative:

Each message/attachment is indexed as a separate Document with the email header 
data included in all Documents.  The drawback of this approach seems to be that 
it is not possible to make AND searches between two body parts in different 
documents directly in Lucene (or is it?).

One advantage of this approach is that it is then possible to use a different 
Analyzer for each Document, which is useful when the attachments contain data in 
different languages.

If combining all attachments to a single body field, it's only possible to use 
the index or Document analyzer.

Has anyone used this type of approach and does it work?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message