spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Genmao Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19556) Broadcast data is not encrypted when I/O encryption is on
Date Wed, 15 Feb 2017 03:33:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867152#comment-15867152
] 

Genmao Yu commented on SPARK-19556:
-----------------------------------

[~vanzin] I am working on this, could you please assign it to me?


> Broadcast data is not encrypted when I/O encryption is on
> ---------------------------------------------------------
>
>                 Key: SPARK-19556
>                 URL: https://issues.apache.org/jira/browse/SPARK-19556
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Marcelo Vanzin
>
> {{TorrentBroadcast}} uses a couple of "back doors" into the block manager to write and
read data:
> {code}
>       if (!blockManager.putBytes(pieceId, bytes, MEMORY_AND_DISK_SER, tellMaster = true))
{
>         throw new SparkException(s"Failed to store $pieceId of $broadcastId in local
BlockManager")
>       }
> {code}
> {code}
>       bm.getLocalBytes(pieceId) match {
>         case Some(block) =>
>           blocks(pid) = block
>           releaseLock(pieceId)
>         case None =>
>           bm.getRemoteBytes(pieceId) match {
>             case Some(b) =>
>               if (checksumEnabled) {
>                 val sum = calcChecksum(b.chunks(0))
>                 if (sum != checksums(pid)) {
>                   throw new SparkException(s"corrupt remote block $pieceId of $broadcastId:"
+
>                     s" $sum != ${checksums(pid)}")
>                 }
>               }
>               // We found the block from remote executors/driver's BlockManager, so put
the block
>               // in this executor's BlockManager.
>               if (!bm.putBytes(pieceId, b, StorageLevel.MEMORY_AND_DISK_SER, tellMaster
= true)) {
>                 throw new SparkException(
>                   s"Failed to store $pieceId of $broadcastId in local BlockManager")
>               }
>               blocks(pid) = b
>             case None =>
>               throw new SparkException(s"Failed to get $pieceId of $broadcastId")
>           }
>       }
> {code}
> The thing these block manager methods have in common is that they bypass the encryption
code; so broadcast data is stored unencrypted in the block manager, causing unencrypted data
to be written to disk if those blocks need to be evicted from memory.
> The correct fix here is actually not to change {{TorrentBroadcast}}, but to fix the block
manager so that:
> - data stored in memory is not encrypted
> - data written to disk is encrypted
> This would simplify the code paths that use BlockManager / SerializerManager APIs (e.g.
see SPARK-19520), but requires some tricky changes inside the BlockManager to still be able
to use file channels to avoid reading whole blocks back into memory so they can be decrypted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message