lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From negrinv <>
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted
Date Sat, 02 Dec 2006 20:38:37 GMT

At the contrary Mike, I am beginning to think that there have been a number
of misunderstandings, of my original posting to start with.
When I submitted my proposal I was prepared for some discussion on the
merits  or otherwise of my proposed solution. I had no idea that the
discussion would drift towards security and performance in absolute terms. I
would like now to steer the debate in its intended direction.

I have no difficulty agreeing with you on both counts. A non-encrypted swap
file is a security risk, and encryption imposes a performance penalty. Both
of which I submit are not relevant to my posting for the following reasons.
Security is all about knowing where you stand so you can take
counter-measures, it is not about a "false sense of security" provided by
knowing you have an encrypted swap file or a 3000 byte encryption key.
Lucene cannot provide security. It would be a legal nightmare and an absurd
expectation. The underlying operating system within which Lucene runs does
not guarantee security, the encryption software provider does not guarantee
security, password protection and physical security are also outside of
Lucene's control. What Lucene can do is to provide encryption services,
while the application has to provide a given level of security. For
instance, if you run under an operating system which does not provide swap
file encryption, then you must disable the swap file. Does that impose a
performance penalty? Probably, if your memory is limited, but now you know
where you stand so you make a decision. Performance or encrytpion or more
memory. But one cannot, in my view, shift the responsability for that
decision to Lucene.
I'll give you another example, you mentioned padding of 128 bits. True,
there are encryption routines which impose that penalty. For my (initial)
implementation I had the choice between an algorythm with padding, or RC4,
which does not pad. A 10 character term remains a 10 character term after
encryption. No padding and no index size implications. I said so in my
posting and as an application developer you then have a choice to make. Use
Lucene RC4 encryption as proposed (for the time being) or use another
product, or write your own. Without knowing the application, any decision
would be totally out of context, and no one piece of software can satisfy
all applications. A possible solution would be for Lucene to offer a choice
of algorythms.

The army I am sure would like to run its tanks at the speed of a Ferrary,
but it cannot, it hits a wall known as cost-benefit ratio. It must choose
between security and speed and budget, keeping in mind the application. The
modern tank is the answer. A compromise.
My original posting avoided the notion of security and performance in
absolute terms precisely because of all the above considerations, it simply
addressed a couple of points which need to be resolved before the specifics
of the implementation can be discussed.

1) is it a good idea to have ancryption added to Lucene? I think so
obviously, but not everyone agrees. As was pointed out in this discussion,
some relational database software provides encryption at the column level, a
functionality equivalent to the one I proposed. Lucene in some ways competes
with relational databases.

2) assuming the answer to 1) above is yes, how should one go about including
encryption in Lucene. My solution is just that, one approach. Others have
proposed directory or file system encryption. My view on this is that this
level of encryption is already provided by all major operating systems, as
well a by some hardware devices. I would not see a justifiable benefit in
adding it to Lucene. But that is only my personal opinion, although I am
aware that directory encryption is in the hands of the system administrator,
not the application end user. Perhaps there are other options which have not
been raised yet.

3) assuming my proposal is acceptable, can it be implemented better. I am
not a Lucene expert, I learned Lucene on the go. I would be delighted to see
a better solution presented, it would be a learning experience for me.

I hope I have not added to the confusion.

Season's greetings to you and to all who took time to participate in this

Robert Engels wrote:
> I think you misunderstood. If you do not have encrypted swap (like  
> OSX provides for) then you encryption is pointless as anyone can  
> inspect the data as it it loaded into the heap by lucene - bypassing  
> the encryption.
> I also think you underestimated the impact on the size of the  
> indexes, as most secure encryption schemes are going to pad the  
> payloads to a minimum of 128 bits, and usually much more.
> This is going to make a HUGE difference in the size of the index.
> On Dec 1, 2006, at 2:00 PM, negrinv wrote:
>> Good news for OSX users! but what about all the others, should I  
>> say the
>> majority??
>> One more reason for encrypting at field level.
>> Victor
>> Robert Engels wrote:
>>> Not if running under OSX with encrypted swap turned on ! :)
>>> -----Original Message-----
>>>> From: Nicolas Lalev�e <>
>>>> Sent: Dec 1, 2006 4:49 AM
>>>> To:
>>>> Subject: Re: Attached proposed modifications to Lucene 2.0 to  
>>>> support
>> Field.Store.Encrypted
>>>> Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
>>>>> Nicolas Lalev�e-2 wrote:
>>>>>> Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
>>>>>>> Thank you Robert for your commnets. I am inclined to agree  
>>>>>>> with you,
>>>>> but
>>>>>>> I
>>>>>>> would like to establish first of all if simplicity of  
>>>>>>> implementation
>>>>> is
>>>>>>> the
>>>>>>> overriding consideration. But before I dwell on that let me 

>>>>>>> say that
>>>>> i
>>>>>>> have
>>>>>>> discovered that I am not a master of DIFF file creation with
>>>>>>> Eclipse.
>>>>>>> The diff file attachement to my original posting is absurdly
>>>>>>> large
>>>>> and
>>>>>>> not correct. I have therefore attached a zip file containing
>>>>>>> complete source code of the classes I modified. I leave it to
>>>>>>> others
>>>>> to
>>>>>>> extract the
>>>>>>> diffs properly.
>>>>>>> Back to the issue. So far the implementation has not been  
>>>>>>> difficult
>>>>>>> considering that I knew nothing about Lucene internals before
>>>>> started.
>>>>>>> The reason is that Lucene is very well structured and the changes
>>>>> just
>>>>>>> fitted nicely by adding some code in the right place with minimal
>>>>>>> changes to the existing code. But I admit that the proposed
>>>>>>> implementation so far is not complete and more work is  
>>>>>>> required to
>>>>>>> overcome some of its restrictions. While I like your idea I 

>>>>>>> believe
>>>>> that
>>>>>>> it imposed too large a
>>>>>>> granularity on the encrypted data, all fields will all kinds
>>>>>>> of data
>>>>>>> will be encrypted including  images and others which normally
>>>>>>> would
>>>>> be
>>>>>>> left alone, thus adding to the performance penalty due to  
>>>>>>> encryption.
>>>>>> I don't agree with you here. In Lucene, you will encrypt the field
>>>>> data,
>>>>>> the
>>>>>> field names, and the tokens : I would say that is represents at 

>>>>>> least
>>>>> 2/3
>>>>>> of
>>>>>> the index size. Then, with the implementation you suggest, I think
>>>>> (sorry
>>>>>> I
>>>>>> didn't took time to see you patch) that every time a lucene  
>>>>>> data need
>>>>> to
>>>>>> be
>>>>>> read, it is decrypted each time. With an encrypted FS, your kernel
>>>>> will
>>>>>> maintain a cache in RAM for you, so it won't hurt so much.
>>>>>> It needs some bench to see what is effectively the best, but I  
>>>>>> have
>>>>> doubt
>>>>>> that
>>>>>> your solution will be faster.
>>>>>> Nicolas.
>>>>> Nicolas, I am all in favour of some tests to establish which  
>>>>> solution is
>>>>> best, but I have to say that I don't believe file system or  
>>>>> directory
>>>>> encryption in Lucene is really justified. Most operating system  
>>>>> already
>>>>> provide this feature, although they are system-wide or policy-based
>>>>> solution, hence not always within individual user control.
>>>>> But if the issue is user control, then I believe Lucene should  
>>>>> provide
>>>>> maximum granularity when it comes to choice of data to encrypt.
>>>>> The issue I believe is whether some form of encryption should be
>>>>> provided
>>>>> within Lucene to enable application developers to create  
>>>>> applications
>>>>> which
>>>>> offer some data protection under user control, with a minimum of  
>>>>> impact,
>>>>> where by impact I mean both on peformance and workload either in  
>>>>> Lucene
>>>>> code or user code.
>>>> In fact you mean a user that has no control of it's machine, and  
>>>> that
>> cannot
>>>> encrypt his partition. Here you will have the issue with the  
>>>> swap : Lucene
>>>> will decrypt the data in RAM, that can possibly pushed on the  
>>>> swap... I
>> know
>>>> this is extreme, but it's a security hole.
>>>> -- 
>>>> Nicolas LALEV�E
>>>> Solutions & Technologies
>>>> Tel : +33 (0)5 61 00 52 90
>>>> Fax : +33 (0)5 61 00 51 46
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> -- 
>> View this message in context: 
>> proposed-modifications-to-Lucene-2.0-to-support- 
>> Field.Store.Encrypted-tf2727614.html#a7645198
>> Sent from the Lucene - Java Developer mailing list archive at  
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message