kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Data encryption in Kudu
Date Fri, 05 May 2017 23:54:17 GMT
On Tue, May 2, 2017 at 8:38 PM, Franco Venturi <fventuri@comcast.net> wrote:

> Dan,
> first of all thanks for reading through my long post and providing your
> comments and advice.
>
>
> You are 100% correct on the TDE column encryption in Oracle; I looked it
> up again in the 'Introduction to Transparent Data Encryption' in the 'Data
> Advanced Security Guide' (https://docs.oracle.com/
> database/121/ASOAG/asotrans.htm#ASOAG10117) and Figure 2-1 clearly shows
> the keys being stored in the database.
> With this piece of information, it doesn't seem to me that Oracle column
> TDE offers much protection in case of an active attacker who has full
> access to the the DB server, since there must be a proces somewhere where
> the database engine is able to retrieve the decryption key for a given
> column.
>

Yes, but this could be in a hardware HSM.


>  Another interesting piece of information in that chapter is this
> sentence:
>

>                 TDE tablespace encryption also allows index range scans on
> data in encrypted tablespaces. This is not possible with TDE column
> encryption.
>
>
> which makes me think that TDE column encryption must encrypt the data
> before placing it into the Btree, and therefore is not able to use the
> Btree for range searches.
>

That's my interpretation as well.


> I think the main reason why an organization would want one or the other
> type of encryption (client-side vs server-side) is what kind of possible
> attack they are trying to prevent (and the criteria are often dictated by
> internal security policies):
>         - with server-side encryption, the encrypted data is protected
> against a disk being lost (the so called 'encryption at rest'), but it is
> not protected against an active attacker on the server with full access
> (they could retrieve the key and then decrypt the data).
>         - with client-side encryption, the server has no way to decrypt
> the data and therefore even the active attacker above wouldn't be able to
> do much with the encrypted data. As I mentioned in my previous post, this
> is similar to what HDFS does for transparent data encryption and I think
> it's one of their selling points ('not even root can decrypt the data on
> HDFS'), and for some IT security groups this may sound attractive.
>

Root privileges on a machine doesn't necessary guarantee access to the key;
the key could be stored remotely, or even on an HSM.


> 100% agree with your performance concerns that client-side encryption
> raises (no range scans on the encryped columns, no compression, RLE, etc),
> to the point that last night I wondered if other people have asked
> themselves similar questions, and I did find a couple of interesting
> approaches:
>         - CryptDB (http://css.csail.mit.edu/cryptdb/ - the main paper is
> here: http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb.pdf)
>         - ZeroDB (https://opensource.zerodb.com/)
>
> in order to be able to do range scans, for instance CryptDB uses this
> 'Order Preserving Encryption', which in theory allows to encrypt data in a
> way that preservers ordering, i.e. Enc(x) < Enc(y) iff x < y; however
> several research papers after that show that this Order Preserving
> Encryption leaks a significant amount of information on the encrypted data
> and is susceptible to frequency and other kind of attacks. As you can
> imagine there's a lot of academic research actively being done in this
> field and, even if not ready for prime time, I though I would share these
> findings.
>

That's really interesting.  Pretty different threat model being assumed by
ZeroDB :).


> After this long digression (hopefully not too boring), I agree that the
> way forward would be to start with looking into the encryption of the file
> store (I think they are called 'cfiles'; I saw also mentions to some
> 'delta' files, and I am not sure if they are written the same way and
> should be encrypted too), and after that the WALs.
>

Yah, I think cfiles are a good place to start.  AFAIK delta files reuse the
cfile machinery when writing to disk. I originally considered recommending
looking at the filesystem block manager, but we often do offset lookups
into the FS blocks, which I don't think could be supported with encryption.

- Dan

------------------------------
> *From: *"Dan Burkert" <danburkert@apache.org>
> *To: *user@kudu.apache.org
> *Sent: *Tuesday, May 2, 2017 2:54:26 PM
>
> *Subject: *Re: Data encryption in Kudu
>
> Hi Franco,
>
> Thanks for the writeup!  I'm not an Oracle expert, but my interpretation
> of the TDE column level encryption documentation/implementation is very
> different than yours.  As far as I can tell, in both the per-column and
> table-space encryption modes, encryption/decryption is handled entirely on
> the Oracle server.  The difference is that column-level encryption will
> encrypt individual cells on disk (leaving the overall tree/index structure
> unencrypted), while table-space level encryption will encrypt at the block
> or file level.
>
> I agree with everything you wrote about the tradoffs involved with client
> vs server encryption, but I think you are underestimating both the
> complexity involved with client-side encryption, as well as the performance
> hit that it would impose.  The loss on encoding, compression, and range
> predicate pushdown would absolutely kill performance for many important
> usecases.  The implementation would also be significantly _more_ difficult
> than server side encryption, because the client would need to manage the
> encryption keys, encrypt/decrypt data, and the solution would need to be
> implemented for every client library (of which there are currently two).
>
> For those reasons, I think server side encryption is the way to go with
> Kudu.  I think you're right that it would slot in as an additional step in
> the encode -> compress -> encrypt pipeline for blocks.  Because blocks are
> relatively large (typically > 1 MiB), the overhead of a 16 byte salt and
> additional MAC are negligible, so we wouldn't need to force the user to
> make that tradeoff.  Basically, we could get all of the advantages that
> Oracle's tablespace level encryption provides, but on a per-column basis.
> There are a couple of additional complications - we also have a WAL that
> lives outside of our file block abstraction, and we would almost certainly
> need to provide encryption for that as well (but perhaps it could be a
> second step in the process).
>
> In-line responses to some other comments below.
>
> On Sat, Apr 29, 2017 at 8:35 PM, Franco Venturi <fventuri@comcast.net>
> wrote:
>
>>
>> - also from the security point of view, since the encryption happens at
>> the client side, the data that is transfered on the network between the
>> client and the server is already encrypted and there's no need (at least
>> from this point of view) to add a layer of encryption between client and
>> server
>>
>
> I'm skeptical of this.  For instances, every scan request includes the
> names and types of the columns that the client wishes to scan, and that
> would be in plaintext without wire encryption.  That would be an issue for
> some usecases.
>
>
>> - from the security point of view, an attacker with full access to the
>> server would probably be able to decrypt the encrypted data
>>
>
> Could you elaborate on this?  As long as we use an external keystore and
> intermediate keys, I don't know how an attacker with access to the on-disk
> files could decrypt them.
>
>
>> - also from a security point of view the server returns the data back in
>> plaintext format; if the data transferred over the network contains
>> sensitive information, it would need an extra encryption layer like TLS or
>> something like that
>>
>
> Correct, and Kudu 1.3 includes TLS wire encryption for exactly this reason.
>
>
>> - as per performance implications, if the encryption on the server side
>> uses something like AES192 or AES256, there are libraries like libcrypto
>> that take advantage of the hardware acceleration for AES encryption on many
>> modern CPUs and therefore I suspect the performance overhead would be
>> limited; this is also indicated by what the Oracle documentation says
>> regarding processing overhead in the case of tablespace encryption in TDE
>>
>
> I agree, I think the overhead of per-block encryption would be pretty
> minimal.
>
>
>> - it would also require a way to have the server manage these column
>> encryption keys (possibly though additional client API's); I haven't looked
>> yet at the way Oracle handles encryption/decryption keys for the tablespace
>> encryption TDE, but it's on my 'to-do' list
>>
>
> Yah, the normal thing to do here is call out to an external keystore that
> holds a master encryption key.
>
> - Dan
>
> ------------------------------
>> *From: *fventuri@comcast.net
>> *To: *user@kudu.apache.org
>> *Sent: *Wednesday, April 26, 2017 9:48:07 PM
>>
>> *Subject: *Re: Data encryption in Kudu
>>
>> David, Dan, Todd,
>> thanks for your prompt replies.
>>
>> At this stage I am just exploring what it would take to implement some
>> sort of data encryption in Kudu.
>>
>> After reading your comments here are some further thoughts:
>>
>> - according to the first sentence in this paragraph in the Kudu docs (
>> https://kudu.apache.org/docs/schema_design.html#compression):
>>
>>          Kudu allows per-column compression using the LZ4, Snappy, or
>> zlib compression codecs.
>>
>> it should be possible to perform per-column encryption by adding
>> 'encryption codecs' right after the compression codecs. I browsed through
>> the code quickly and I think this done when reading/writing a 'cfile'
>> (please correct me if I am wrong). If this is correct, this change could be
>> 'minimally invasive' (at least for the 'cfile' part) and would not require
>> a major overhaul of the Kudu architecture.
>>
>> - as per the key management aspect, I am not a security expert at all, so
>> I am not sure what would be the best approach here - my thought here is
>> that in most places Kudu is deployed together with HDFS, so it would be
>> 'desirable' if the key management were consistent between the two services;
>> on the other hand, I also realize that the basic premises are fundamentally
>> different: HDFS encrypts everything at the client level and therefore the
>> HDFS engine itself is almost completely unaware that the data it stores is
>> actually encrypted (except for a special file hidden attribute, if I
>> understand correctly), while in Kudu the storage engine must have both the
>> 'public' key (when encrypting) and the 'private' key (when decrypting)
>> otherwise it can't take advantage of knowing the 'structure' of the data
>> (for instance the Bloom filters wouldn't probably work with the key being
>> encrypted). This means for instance that an attacker who is able to gain
>> access to the Kudu tablet servers would probably be able to decrypt the
>> data. Also one way to achieve something similar to what HDFS does (i.e.
>> client-based encryption and data encrypted in-flight) could be perhaps
>> using a one-time client certificate generated by the KMS server, but this
>> would also require changes to the client code.
>>
>> Franco
>>
>>
>> ------------------------------
>> *From: *"Todd Lipcon" <todd@cloudera.com>
>> *To: *user@kudu.apache.org
>> *Sent: *Tuesday, April 25, 2017 3:49:50 PM
>> *Subject: *Re: Data encryption in Kudu
>>
>> Agreed with what Dan said.
>>
>> I think there are a number of interesting design alternatives to be
>> considered, so before coding it would be great to work through a design
>> document to explore the alternatives. For example, we could try to apply
>> encryption at the 'fs/' layer, which would cover all non-WAL data, but then
>> we would lose the ability to specify encryption on a per-column basis.
>> There are other requirements that need to be ironed out about whether we'd
>> need to support separate encryption keys per column/table/server/etc,
>> whether metadata also needs to be encrypted, etc.
>>
>> -Todd
>>
>> On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert <danburkert@apache.org>
>> wrote:
>>
>>> Hi Franco,
>>>
>>> I think you are right that a client-based approach wouldn't work,
>>> because we wouldn't want to encrypt at the level of individual cell
>>> values.  That would get in the way of encoding, compression, predicate
>>> evaluation, etc.  As you note, adding encryption at the block layer is
>>> probably the way to go.  Key management is definitely the tricky issue.  We
>>> do have one advantage over HDFS - because Kudu does logical replication,
>>> the encryption key can be scoped to a particular tablet server or tablet
>>> replica, it wouldn't need to be shared among all replicas.  I haven't done
>>> enough research to know if this makes it fundamentally easier to do key
>>> management.  I would assume at a minimum we would want to integrate with
>>> key providers such an HSM.  It would be good to have a thorough review of
>>> existing solutions in the space, such as TDE
>>> <https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the
>>> Hadoop KMS.  Is this something you are interested in working on?
>>>
>>> - Dan
>>>
>>> On Tue, Apr 25, 2017 at 8:30 AM, David Alves <davidralves@gmail.com>
>>> wrote:
>>>
>>>> Hi Franco
>>>>
>>>>   Dan, Alexey, Todd are our security experts.
>>>>   Folks, thoughts on this?
>>>>
>>>> Best
>>>> David
>>>>
>>>> On Mon, Apr 24, 2017 at 7:08 PM, <fventuri@comcast.net> wrote:
>>>>
>>>>> Over the weekend I started looking at what it would take to add data
>>>>> encryption to Kudu (besides using filesystem encryption via dm-crypt
or
>>>>> something like that).
>>>>>
>>>>> Here are a few notes - please feel free to comment on them and add
>>>>> suggestions:
>>>>>
>>>>> - reading through this mailing list, it looks like this feature has
>>>>> been asked a couple of times but last year, but from what I can tell,
noone
>>>>> is currently working on it.
>>>>> - a client-based approach to encryption like the one used by HDFS
>>>>> wouldn't work (at least out of the box) because for instance encrypting
the
>>>>> primary key at the client would prevent being able to have range filters
>>>>> for scans; it might work for the columns that are not part of the primary
>>>>> key
>>>>> - there's already code in Kudu for several compression codecs (LZ4,
>>>>> gzip, etc); I thought it would be possible to add similar code for
>>>>> encryption codecs (to be applied after the compression, of course)
>>>>> - the WAL log files and delta files should be similarly encrypted too
>>>>> - not sure what would be the best way to manage the key - I see that
>>>>> in HDFS they use a double key mechanism, where the encryption key for
the
>>>>> data file is itself encrypted with the allowed user key and this whole
>>>>> process is managed by an external Key Management Service
>>>>>
>>>>> Thanks in advance for your ideas and suggestions,
>>>>> Franco
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>>
>>
>
>

Mime
View raw message