lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Tue, 05 Dec 2006 01:20:34 GMT

(For the record: I have delibierately avoided looking at your patch so
far, because i didn't want my opinion on the question of "should Lucene
offer encryption services" to be clouded by any specifics of your
implimentation.  That said...)

As it's already been pointed out, an apples to apples comparison can not
be made between supporting encryption and supporting compression, but lets
talk about compression a little anyway.

Compression of stored fields is a feature that the Lucene "core" currently
supports out of the box -- but it does so in a very limited maner that
doesn't allow for much configuration.  There is no advantage for users in
using compressed fields over compressing the data themselves before adding
it to the index, only disdvantages: notably the limited control the user
has over the compression, and added complexity for the code path executed
by all users -- even if they don't use compression (a boolean test on
"compressed" in FieldsReader may be fast ... but it's still a bytecode op
for every field that's completley uneccessary for a large portion of the
user base)

If the code was not already in the core, and someone asked about adding it
I would argue against doing so on the grounds that some helpfull utility
methods (possibly in a contrib) would be just as usefull, and would have
no performance cost for people who don't care about compression.

Now let's talk about encryption again:

First off, if all we are interested in in Encrypting *stored* data,
then the issue becomes exactly the same as compression: there is no point
in putting this functionality in the "core" Lucene code base when it can
be done using helper utility methods -- now that that's out of the way,
let's talk about the good stuff...

If we want to encrypt the text portion of Terms that are index for a
specific set of fields, this is again something that can easily be done
without modifying the "core" Lucene code base -- utility methods can be
used to help people encrypt UN_TOKENIZED Field values, and a simple
AnalyzerWrapper can be made to encrypt the text portion of Tokens produced
by another analyzer both when indexing Field values and when QueryParser
is Analyzing input text if neccessary.

As others have already pointed out: encrypting just the Term text doesn't
do much to aid the overall security of your data -- because a bad guy with
access to your index can use the various statistics about your terms
(docFreq, term vectors, term positions, etc...) to aid them in cracking
your encryption -- maybe a user is okay with that risk, in which case my
previous comment about how this can easily be done without modifying any
core lucene classes still holds.  what about users who don't think this is
an acceptible risk? ... a more robust encryption mechanism is

So exactly what pieces of data about a set of fields in an index need to
be encrypted before you can adequetly say that those fields are encrypted?
Off the top of my head i don't know, but I think the only way to play it
safe is to assume thta *all* of the data needs to be encrypted.  Now the
question becomes: do we modify all of the index writitng/reading code
to add a lot of "if (encrypted) { ... } else { ... }" checks, or is there
an easier way to ensure that all of the data in encrypted without
impacting the majority of hte user base?

I would argue that creating an EncryptedDirectory class with an API that
looks something like this...

  public class EncryptedDirectory extends Directory {
    public Directory(Directory wraped, EncryptionProvider provider);
    // all Directory methods here

...might be the best way to go, as it:
  1) achieves the result (provide encryption)
  2) doesn't affect performance of clients who don't care abotu the feature
  3) doesn't limit the functionality of users who do use the feature (the
physical index can still be stored in a database, or stored on disk, or
stored purely in RAM.

If users who want to use encryption really care deeply about only
having *some* of their fields encrypted, and don't want to pay the
performance costs of encryption for their other fields, they can use
a ParallelReader spanning two indexes: one using and EncryptedDirectory
wrapped arround the sensitive ifelds and one using a regula directory
containing the unsafe fields.

: 1) is it a good idea to have ancryption added to Lucene? I think so

: 2) assuming the answer to 1) above is yes, how should one go about including
: encryption in Lucene. My solution is just that, one approach. Others have

I would say that my answer to #1 is "maybe" and my answer to #2 is "in
some way that has no impact at all on people who don't want to use it.
that said, I'm assuming since this thread subject mentions
Field.Store.Encrypted that your approach is a fairly "low level" change
that would impact non-users (slightly, but impact non the less)

 - Do my concerns about that impact make sense to you?
 - Does my (high level) description of how i think encryption might make
   sense as an optional Lucene feature make sense?
 - are there any advantages you see to your approach that you feel make it
   more worthwhile then a Directory based approach?

: encryption in Lucene. My solution is just that, one approach. Others have
: proposed directory or file system encryption. My view on this is that this
: level of encryption is already provided by all major operating systems, as
: well a by some hardware devices. I would not see a justifiable benefit in
: adding it to Lucene. But that is only my personal opinion, although I am

There is a big differnece however in a "file system directory" and an
"" -- i agree with you that just adding
the ability to encrypt an FSDirectory would have little advantages over
using a more OS based approach, but it might make a lot of sense to do it
at the Lucene Directory level -- so users can leverage it no matter where
they store their index.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message