This is a duplicate of my stack overflow question here:

https://stackoverflow.com/questions/57881044/verifying-in-transit-encryption-for-spark-shuffle

I'm running Spark over YARN on AWS EMR 5.20.

I've followed the following guide for running in-transit encryption for spark shuffle:

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/configuring-spark/content/configuring_spark_for_wire_encryption.html

First off, this doc only refers to self-signed certs, and we're using a CA-signed cert. No big deal, I put the CA cert in the truststore.

Unfortunately, I'm not in a position to use the built-in Amazon In Transit encryption, nor can I use Spark Defaults as we send along our spark assembly with our jobs to allow multiple versions to be used.

The piece I'm tacking on to our spark jobs looks like this:

spark.shuffle.encryption.enabled=true
spark.ssl.enabled=true
spark.ssl.keyPassword=*****
spark.ssl.keyStore="/opt/my-cluster/keystore.jks"
spark.ssl.keyStorePassword=*****
spark.ssl.protocol=TLS
spark.ssl.trustStore="/opt/my-cluster/truststore.jks”
spark.ssl.trustStorePassword=*****
spark.authenticate=true
spark.network.crypto.enabled=true
spark.enableSaslEncryption=true
spark.ui.https.enabled=true
spark.io.encryption.enabled=true
spark.network.sasl.serverAlwaysEncrypt=true

Jobs are running fine. I'm running a simple job I'm assuming will force a shuffle. Here's the code:

import org.apache.spark.sql.SparkSession

import scala.util.Random

object SparkShuffleTest {

  def main(args: Array[String]) {
    val randomText = for (i <- Range(0,100000)) yield Random.nextPrintableChar()
    val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
    val logData = spark.sparkContext.parallelize(randomText)
    val pairs = logData.map(c => (c, 1))
    pairs.foreach(println(_))
    val outputs = pairs.reduceByKey(_ + _).collect()
    outputs.foreach({case (a, b) => println(s"$a:$b")})
    println("Outputs collected...")
    println(outputs)
    spark.stop()
  }
}

So, here's the tough part:

If I screw around with the location of the keystore and change it to a bogus name, my jobs fail, as they should, because they can't find a valid keystore. However, if I do this to the truststore, there's no failure. It's like it's not even reading the truststore. How can I actually get this to encrypt, or what am I configuring wrong? Obviously, if I'm giving it a bogus truststore, it ought to fail at encrypting shuffle. Does that just not throw an error at all?

Thanks!