lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: Lucene Index Encryption
Date Wed, 06 May 2009 18:30:23 GMT
Thanks for the comments they are very insightful.
I hadn't thought about the Random Access issues until you brought it up.

This makes the project a little tougher, but not impossible. 
I was searching last night and there have been a couple of papers
written on the topic of Encrypted Random Access files at MIT.
I haven't finished reading all of them yet, but they suggest ways of
solving the Encryption problems for Random Access Files.
I am going to spend a few day looking at the various papers before I
waste your time discussing this any further.
>(Presumably performance will suffer perhaps substantially since every
>search will need to decrypt on the fly...).
Yes, I imagine that there will be a performance hit, this will add
significant overhead to every byte that Lucene accesses.
However, in some applications the price of having unsecure data is
unacceptable, when secure data is published to a laptop for use offline.
In this case, the additional time needed to access the index would be
Examples: Military, Medical, and Financial information.
Re: Lucene Index Encryption
Michael McCandless (
May 5, 2009 1:22:00 am
Would you encrypt at the file level?  Ie, the encryption would live
"under" a RandomAccessFile (RAF) and otherwise feel "normal" to
(I think I remember others exploring encryption at the individual term
level, which is interesting but does leak information in that you can
see individual terms & their frequencies).
Lucene needs to be able to ask a RAF opened for writing what it's
current "position" is during indexing, which it then stores away, and
later during searching it needs to ask a RAF opened for reading to
seek back to that position so it can read bytes from there.  Would the
encryption APIs allow this?
If this is possible then couldn't one make a Directory impl that hides
all encryption/decryption "under the hood"?
(Presumably performance will suffer perhaps substantially since every
search will need to decrypt on the fly...).
On Mon, May 4, 2009 at 6:29 PM,  <> wrote:
I hope to make this a discussion rather than a request for a feature.
In the database world, secure data is always encrypted in the database.
Since I am interested in storing data from a database in the index, at
times I want to encrypt the index when the file is one disk.
Currently data stored in the Lucene Index is easily accessible to any
program that wants to access it. You cannot store sensitive data in the
index without the fear that it will be readable by all the people that
have access to the system.
There are two other posts in the mailing list that ask a question about
Lucene Index Encryption. In both cases, I think that the conservation
was dropped or the feature put off.
Basically, I am asking for comments on the topic. I might consider
coding the feature, but I would only do it if I am sure that the feature
would be useful and accepted back into the core codebase of Lucene.
The Sun javax.crypto package is available in the JDK 1.4 so using that
package could be possible way of providing an encrypted file.
The other option is Bouncy Castle, which is now being used in the PDFBox
and Tika projects.
In any case, because the normal Lucene Index implementation would not
use an encrypted index, all references to Security classes should load
dynamically with the "Class.forName()" method if they were not part of a
standard JRE, to guarantee no additional requirements are placed on
people currently using the Lucene libraries.
Then there is the issue of what to use as the Encryption Key, and how to
allow access to the Index files from the various programs that may need
to get to the data. The Encryption Key needs to external from any
program that accesses the Index, because with Java, if the key were
stored in the code, it would be easily found with a simple decompile of
the Java class.
I don't have answers to the questions, but basically I am requesting
comments on the topic.
I imagine that if I put Encryption and Decryption at the I/O level,
immediately before a segment was written or immediately after a segment
was read, that I would minimize the overall impact of the Lucene
Another area to address is Remote Searching. The Remote Interface would
need extensions that allow for Encrypted Remote files as well as
Encrypted communication between the machines.
However, I am not sure of these assumptions. I don't know how many
places the segments are read and written. I really do not know how to do
this currently, but would be willing to give it a try it there was
enough interest shown in the topic.
To unsubscribe, e-mail:
For additional commands, e-mail:
To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message