I see. There should not be a significant algorithmic difference between those two cases, as far as I can think, but there is a good bit of "local-mode-only" logic in Spark.
One typical problem we see on large-heap, many-core JVMs, though, is much more time spent in garbage collection. I'm not sure how oprofile gathers its statistics, but it's possible the stop-the-world pauses just appear as pausing inside regular methods. You could see if this is happening by adding "-XX:+PrintGCDetails" to spark.executor.extraJavaOptions (in spark-defaults.conf) and --driver-java-options (as a command-line argument), and then examining the stdout logs.