What application are you running? Here's a few things:

- You will hit bottleneck on CPU if you are doing some complex computation (like parsing a json etc.)
- You will hit bottleneck on Memory if your data/objects used in the program is large (like defining playing with HashMaps etc inside your map* operations), Here you can set spark.executor.memory to a higher number and also you can change the spark.storage.memoryFraction whose default value is 0.6 of your executor memory.
- Network will be a bottleneck if data is not available locally on one of the worker and hence it has to collect it from others, which is a lot of Serialization and data transfer across your cluster.

Best Regards

On Tue, Feb 17, 2015 at 11:20 AM, Julaiti Alafate <jalafate@eng.ucsd.edu> wrote:
Hi there,

I am trying to scale up the data size that my application is handling. This application is running on a cluster with 16 slave nodes. Each slave node has 60GB memory. It is running in standalone mode. The data is coming from HDFS that also in same local network.

In order to have an understanding on how my program is running, I also had a Ganglia installed on the cluster. From previous run, I know the stage that taking longest time to run is counting word pairs (my RDD consists of sentences from a corpus). My goal is to identify the bottleneck of my application, then modify my program or hardware configurations according to that.

Unfortunately, I didn't find too much information on Spark monitoring and optimization topics. Reynold Xin gave a great talk on Spark Summit 2014 for application tuning from tasks perspective. Basically, his focus is on tasks that oddly slower than the average. However, it didn't solve my problem because there is no such tasks that run way slow than others in my case.

So I tried to identify the bottleneck from hardware prospective. I want to know what the limitation of the cluster is. I think if the executers are running hard, either CPU, memory or network bandwidth (or maybe the combinations) is hitting the roof. But Ganglia reports the CPU utilization of cluster is no more than 50%, network utilization is high for several seconds at the beginning, then drop close to 0. From Spark UI, I can see the nodes with maximum memory usage is consuming around 6GB, while "spark.executor.memory" is set to be 20GB. 

I am very confused that the program is not running fast enough, while hardware resources are not in shortage. Could you please give me some hints about what decides the performance of a Spark application from hardware perspective?