Hey Tobias,

As you suspect, the reason why it's slow is because the resource manager in YARN takes a while to grant resources. This is because YARN needs to first set up the application master container, and then this AM needs to request more containers for Spark executors. I think this accounts for most of the overhead. The remaining source probably comes from how our own YARN integration code polls application (every second) and cluster resource states (every 5 seconds IIRC). I haven't explored in detail whether there are optimizations there that can speed this up, but I believe most of the overhead comes from YARN itself.

In other words, no I don't know of any quick fix on your end that you can do to speed this up.


2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <tgp@preferred.jp>:

I am using spark-submit to submit my application to YARN in "yarn-cluster" mode. I have both the Spark assembly jar file as well as my application jar file put in HDFS and can see from the logging output that both files are used from there. However, it still takes about 10 seconds for my application's yarnAppState to switch from ACCEPTED to RUNNING.

I am aware that this is probably not a Spark issue, but some YARN configuration setting (or YARN-inherent slowness), I was just wondering if anyone has an advice for how to speed this up.