lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Tue, 05 Dec 2006 21:38:07 GMT
If it is only meant to protect from "prying eyes" a simple field  
level analyzer that does a simple xor/rotation should suffice. It  
will be much faster and simpler.

Going beyond that, your solution is not very secure as has been  
pointed out, so you might as well just uses the simplest solution.

On Dec 5, 2006, at 3:28 PM, negrinv wrote:

> Chris Hostetter wrote:
>> Compression of stored fields is a feature that the Lucene "core"  
>> currently
>> supports out of the box -- but it does so in a very limited maner  
>> that
>> doesn't allow for much configuration.  There is no advantage for  
>> users in
>> using compressed fields over compressing the data themselves  
>> before adding
>> it to the index, only disdvantages: notably the limited control  
>> the user
>> has over the compression, and added complexity for the code path  
>> executed
>> by all users -- even if they don't use compression (a boolean test on
>> "compressed" in FieldsReader may be fast ... but it's still a  
>> bytecode op
>> for every field that's completley uneccessary for a large portion  
>> of the
>> user base)
>> If the code was not already in the core, and someone asked about  
>> adding it
>> I would argue against doing so on the grounds that some helpfull  
>> utility
>> methods (possibly in a contrib) would be just as usefull, and  
>> would have
>> no performance cost for people who don't care about compression.
> Perhaps, if you look at compression on its own, but once you see  
> compression
> in the context of all the other field options it makes sense to  
> have it
> added to Lucene, it's about having everything in one place for ease of
> implementation that offsets the performance issue, in my opinion.
>> First off, if all we are interested in in Encrypting *stored* data,
>> then the issue becomes exactly the same as compression: there is  
>> no point
>> in putting this functionality in the "core" Lucene code base when  
>> it can
>> be done using helper utility methods -- now that that's out of the  
>> way,
>> let's talk about the good stuff...
> As above
>> If we want to encrypt the text portion of Terms that are index for a
>> specific set of fields, this is again something that can easily be  
>> done
>> without modifying the "core" Lucene code base -- utility methods  
>> can be
>> used to help people encrypt UN_TOKENIZED Field values, and a simple
>> AnalyzerWrapper can be made to encrypt the text portion of Tokens  
>> produced
>> by another analyzer both when indexing Field values and when  
>> QueryParser
>> is Analyzing input text if neccessary.
> I take your word for it, but wouldn't you agree that replacing all  
> the above
> with just one line, "Field.Store.Encrypted" (or  
> Field.Store.Encrypt, for
> compatibility with Field.Store.Compress),would be a lot easier to  
> use for
> the average developer?
>> As others have already pointed out: encrypting just the Term text  
>> doesn't
>> do much to aid the overall security of your data -- because a bad  
>> guy with
>> access to your index can use the various statistics about your terms
>> (docFreq, term vectors, term positions, etc...) to aid them in  
>> cracking
>> your encryption -- maybe a user is okay with that risk, in which  
>> case my
>> previous comment about how this can easily be done without  
>> modifying any
>> core lucene classes still holds.  what about users who don't think  
>> this is
>> an acceptible risk? ... a more robust encryption mechanism is
>> neccessary...
> Security is a big topic, we cannot hope to discuss it here. I am  
> talking
> about some form of data protection, not security.
> When you say "a bad guy with access to your index", you imply that  
> nothing
> can be done to protect the index. But accessing an index which you are
> determined to protect would not be easy, would require expertise,  
> money, as
> well as the risk of a potential jail sentence. If you have National  
> Security
> in mind, be assured no agency responsible for national security  
> will use
> open source software which is not certified, and that is downloaded  
> from an
> unsecure site over the internet, in order to protect the nation (I  
> hope!).
> If we are talking about applications which need to protect data  
> from curious
> or even ill-intentioned eyes, then you can provide a deterrent by  
> encrypting
> that sensitive data only. It might be a list of names, or balances, or
> credit card numbers. Lucene alone can only provide some form of data
> protection, not security. If you accept this limitation you will  
> find it
> easier to accept the notion of encryption at field level, just like  
> some
> relational database software encrypts at column level. Just as  
> importantly
> you want to be able to search over that encrypted field, somehing  
> which my
> proposed code provides (within the stated current limitation).
>> So exactly what pieces of data about a set of fields in an index  
>> need to
>> be encrypted before you can adequetly say that those fields are  
>> encrypted?
>> Off the top of my head i don't know, but I think the only way to  
>> play it
>> safe is to assume thta *all* of the data needs to be encrypted.
> Cannot agree here, it's application dependent. And keep in mind  
> that once
> you offer new functionality people will find many original  
> applications for
> it.
>> Now the question becomes: do we modify all of the index writitng/ 
>> reading
>> code
>> to add a lot of "if (encrypted) { ... } else { ... }" checks, or  
>> is there
>> an easier way to ensure that all of the data in encrypted without
>> impacting the majority of hte user base?
> A perfectly valid point, only benchmarking will tell by how much  
> the current
> performance of Lucene will be impacted by the addition of encryption.
> Somebody in this discussion suggested a Lucene benchmarking tool  
> which can
> be used. I am not familiar with it, but if it is easy to run then  
> let's do
> it and resolve factually this part of the discussion.
> On a more philosophical level, are you saying that there should not  
> be any
> added functionality to Lucene if it impacts the performance of  
> those who do
> not need the additional functionality. This could be a major  
> limitation to
> the future of Lucene. Perhaps one should set some small % limits to  
> the
> level of impact, but zero could be too limiting.
>> I would argue that creating an EncryptedDirectory class with an  
>> API that
>> looks something like this.......
>> .............
>> .............
>>  - Do my concerns about that impact make sense to you?
>>  - Does my (high level) description of how i think encryption  
>> might make
>>    sense as an optional Lucene feature make sense?
>>  - are there any advantages you see to your approach that you feel  
>> make it
>>    more worthwhile then a Directory based approach?
> Points one and two are pefectly valid and make a lot of sense.  
> Point three
> is about what is best for the most, given that there is already an  
> OS option
> to encrypt at directory level.
> I like field encryption because it is functionality which cannot be
> implemented at the OS level, and because of its granularity and its
> similarity to existing Lucene functionality, it would be more  
> intuitive and
> easier to implement at the application level. Encrypting everything  
> in a
> directory would have a performance impact on the application.
> I accept your point about the difference between a file system  
> directory and
> a Lucene directory. But in order to overcome the lack of field-level
> encryption and to minimise the performance impact on the  
> application you
> would be forced to create a separate index and directory for each  
> field
> which you want encrypted.  It will work, but is not a solution I  
> would like
> to have adopt at the application level.
> Finally a point about my code. I was unsuccessful in creating a  
> diff file
> because I was picking up all kind of formatting differences as  
> well. If you
> scan it quickly you will find that is really very simple and, at  
> least in
> its current limited implementation, hardly invasive of Lucene's  
> core. All
> the encryption routines are in a separate class which i placed in the
> utility package.
> Victor
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> -- 
> View this message in context: 
> proposed-modifications-to-Lucene-2.0-to-support- 
> Field.Store.Encrypted-tf2727614.html#a7708481
> Sent from the Lucene - Java Developer mailing list archive at  
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message