I'm running a Spark 1.6.0 on YARN on a Hadoop 2.6.0 cluster. 
I observe a very strange issue. 
I run a simple job that reads about 1TB of json logs from a remote HDFS cluster and converts them to parquet, then saves them to the local HDFS of the Hadoop cluster.

I run it with 25 executors with sufficient resources. However the strange thing is that the job only uses 2 executors to do most of the read work.

For example when I go to the Executors' tab in the Spark UI and look at the "Input" column, the difference between the nodes is huge, sometimes 20G vs 120G.

Any ideas how to achieve a more balanced performance?