kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: Data encryption in Kudu
Date Tue, 02 May 2017 18:54:26 GMT
Hi Franco,

Thanks for the writeup!  I'm not an Oracle expert, but my interpretation of
the TDE column level encryption documentation/implementation is very
different than yours.  As far as I can tell, in both the per-column and
table-space encryption modes, encryption/decryption is handled entirely on
the Oracle server.  The difference is that column-level encryption will
encrypt individual cells on disk (leaving the overall tree/index structure
unencrypted), while table-space level encryption will encrypt at the block
or file level.

I agree with everything you wrote about the tradoffs involved with client
vs server encryption, but I think you are underestimating both the
complexity involved with client-side encryption, as well as the performance
hit that it would impose.  The loss on encoding, compression, and range
predicate pushdown would absolutely kill performance for many important
usecases.  The implementation would also be significantly _more_ difficult
than server side encryption, because the client would need to manage the
encryption keys, encrypt/decrypt data, and the solution would need to be
implemented for every client library (of which there are currently two).

For those reasons, I think server side encryption is the way to go with
Kudu.  I think you're right that it would slot in as an additional step in
the encode -> compress -> encrypt pipeline for blocks.  Because blocks are
relatively large (typically > 1 MiB), the overhead of a 16 byte salt and
additional MAC are negligible, so we wouldn't need to force the user to
make that tradeoff.  Basically, we could get all of the advantages that
Oracle's tablespace level encryption provides, but on a per-column basis.
There are a couple of additional complications - we also have a WAL that
lives outside of our file block abstraction, and we would almost certainly
need to provide encryption for that as well (but perhaps it could be a
second step in the process).

In-line responses to some other comments below.

On Sat, Apr 29, 2017 at 8:35 PM, Franco Venturi <fventuri@comcast.net>
wrote:

>
> - also from the security point of view, since the encryption happens at
> the client side, the data that is transfered on the network between the
> client and the server is already encrypted and there's no need (at least
> from this point of view) to add a layer of encryption between client and
> server
>

I'm skeptical of this.  For instances, every scan request includes the
names and types of the columns that the client wishes to scan, and that
would be in plaintext without wire encryption.  That would be an issue for
some usecases.


> - from the security point of view, an attacker with full access to the
> server would probably be able to decrypt the encrypted data
>

Could you elaborate on this?  As long as we use an external keystore and
intermediate keys, I don't know how an attacker with access to the on-disk
files could decrypt them.


> - also from a security point of view the server returns the data back in
> plaintext format; if the data transferred over the network contains
> sensitive information, it would need an extra encryption layer like TLS or
> something like that
>

Correct, and Kudu 1.3 includes TLS wire encryption for exactly this reason.


> - as per performance implications, if the encryption on the server side
> uses something like AES192 or AES256, there are libraries like libcrypto
> that take advantage of the hardware acceleration for AES encryption on many
> modern CPUs and therefore I suspect the performance overhead would be
> limited; this is also indicated by what the Oracle documentation says
> regarding processing overhead in the case of tablespace encryption in TDE
>

I agree, I think the overhead of per-block encryption would be pretty
minimal.


> - it would also require a way to have the server manage these column
> encryption keys (possibly though additional client API's); I haven't looked
> yet at the way Oracle handles encryption/decryption keys for the tablespace
> encryption TDE, but it's on my 'to-do' list
>

Yah, the normal thing to do here is call out to an external keystore that
holds a master encryption key.

- Dan

------------------------------
> *From: *fventuri@comcast.net
> *To: *user@kudu.apache.org
> *Sent: *Wednesday, April 26, 2017 9:48:07 PM
>
> *Subject: *Re: Data encryption in Kudu
>
> David, Dan, Todd,
> thanks for your prompt replies.
>
> At this stage I am just exploring what it would take to implement some
> sort of data encryption in Kudu.
>
> After reading your comments here are some further thoughts:
>
> - according to the first sentence in this paragraph in the Kudu docs (
> https://kudu.apache.org/docs/schema_design.html#compression):
>
>          Kudu allows per-column compression using the LZ4, Snappy, or zlib compression
> codecs.
>
> it should be possible to perform per-column encryption by adding
> 'encryption codecs' right after the compression codecs. I browsed through
> the code quickly and I think this done when reading/writing a 'cfile'
> (please correct me if I am wrong). If this is correct, this change could be
> 'minimally invasive' (at least for the 'cfile' part) and would not require
> a major overhaul of the Kudu architecture.
>
> - as per the key management aspect, I am not a security expert at all, so
> I am not sure what would be the best approach here - my thought here is
> that in most places Kudu is deployed together with HDFS, so it would be
> 'desirable' if the key management were consistent between the two services;
> on the other hand, I also realize that the basic premises are fundamentally
> different: HDFS encrypts everything at the client level and therefore the
> HDFS engine itself is almost completely unaware that the data it stores is
> actually encrypted (except for a special file hidden attribute, if I
> understand correctly), while in Kudu the storage engine must have both the
> 'public' key (when encrypting) and the 'private' key (when decrypting)
> otherwise it can't take advantage of knowing the 'structure' of the data
> (for instance the Bloom filters wouldn't probably work with the key being
> encrypted). This means for instance that an attacker who is able to gain
> access to the Kudu tablet servers would probably be able to decrypt the
> data. Also one way to achieve something similar to what HDFS does (i.e.
> client-based encryption and data encrypted in-flight) could be perhaps
> using a one-time client certificate generated by the KMS server, but this
> would also require changes to the client code.
>
> Franco
>
>
> ------------------------------
> *From: *"Todd Lipcon" <todd@cloudera.com>
> *To: *user@kudu.apache.org
> *Sent: *Tuesday, April 25, 2017 3:49:50 PM
> *Subject: *Re: Data encryption in Kudu
>
> Agreed with what Dan said.
>
> I think there are a number of interesting design alternatives to be
> considered, so before coding it would be great to work through a design
> document to explore the alternatives. For example, we could try to apply
> encryption at the 'fs/' layer, which would cover all non-WAL data, but then
> we would lose the ability to specify encryption on a per-column basis.
> There are other requirements that need to be ironed out about whether we'd
> need to support separate encryption keys per column/table/server/etc,
> whether metadata also needs to be encrypted, etc.
>
> -Todd
>
> On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert <danburkert@apache.org>
> wrote:
>
>> Hi Franco,
>>
>> I think you are right that a client-based approach wouldn't work, because
>> we wouldn't want to encrypt at the level of individual cell values.  That
>> would get in the way of encoding, compression, predicate evaluation, etc.
>> As you note, adding encryption at the block layer is probably the way to
>> go.  Key management is definitely the tricky issue.  We do have one
>> advantage over HDFS - because Kudu does logical replication, the encryption
>> key can be scoped to a particular tablet server or tablet replica, it
>> wouldn't need to be shared among all replicas.  I haven't done enough
>> research to know if this makes it fundamentally easier to do key
>> management.  I would assume at a minimum we would want to integrate with
>> key providers such an HSM.  It would be good to have a thorough review of
>> existing solutions in the space, such as TDE
>> <https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the
>> Hadoop KMS.  Is this something you are interested in working on?
>>
>> - Dan
>>
>> On Tue, Apr 25, 2017 at 8:30 AM, David Alves <davidralves@gmail.com>
>> wrote:
>>
>>> Hi Franco
>>>
>>>   Dan, Alexey, Todd are our security experts.
>>>   Folks, thoughts on this?
>>>
>>> Best
>>> David
>>>
>>> On Mon, Apr 24, 2017 at 7:08 PM, <fventuri@comcast.net> wrote:
>>>
>>>> Over the weekend I started looking at what it would take to add data
>>>> encryption to Kudu (besides using filesystem encryption via dm-crypt or
>>>> something like that).
>>>>
>>>> Here are a few notes - please feel free to comment on them and add
>>>> suggestions:
>>>>
>>>> - reading through this mailing list, it looks like this feature has
>>>> been asked a couple of times but last year, but from what I can tell, noone
>>>> is currently working on it.
>>>> - a client-based approach to encryption like the one used by HDFS
>>>> wouldn't work (at least out of the box) because for instance encrypting the
>>>> primary key at the client would prevent being able to have range filters
>>>> for scans; it might work for the columns that are not part of the primary
>>>> key
>>>> - there's already code in Kudu for several compression codecs (LZ4,
>>>> gzip, etc); I thought it would be possible to add similar code for
>>>> encryption codecs (to be applied after the compression, of course)
>>>> - the WAL log files and delta files should be similarly encrypted too
>>>> - not sure what would be the best way to manage the key - I see that in
>>>> HDFS they use a double key mechanism, where the encryption key for the data
>>>> file is itself encrypted with the allowed user key and this whole process
>>>> is managed by an external Key Management Service
>>>>
>>>> Thanks in advance for your ideas and suggestions,
>>>> Franco
>>>>
>>>
>>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>

Mime
View raw message