We are trying to measure the sorting performance of Spark. We have a 16 node cluster with 48 cores and 256GB of ram in each machine and 10Gbps network.
Let's say we are running with 128 parallel tasks and each partition generates about 1GB of data (total 128GB).
We are using the method repartitionAndSortWithinPartitions
A standalone cluster is used with the following configuration.
--executor-memory 16G --executor-cores 1 --num-executors 128
I believe this sets 128 executors to run the job each having 16GB of memory and spread across 16 nodes with 8 threads in each node. This configuration runs very slow. The program doesn't use disks to read or write data (data generated in-memory and we don't write to file after sorting).
It seems even though the data size is small, it uses disk for the shuffle. We are not sure our configurations are optimal to achieve the best performance.