spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swapnil Shinde <swapnilushi...@gmail.com>
Subject Spark driver locality
Date Thu, 27 Aug 2015 16:30:12 GMT
Hello
I am new to spark world and started to explore recently in standalone mode.
It would be great if I get clarifications on below doubts-

1. Driver locality - It is mentioned in documentation that "client"
deploy-mode is not good if machine running "spark-submit" is not co-located
with worker machines. cluster mode is not available with standalone
clusters. Therefore, do we have to submit all applications on master
machine? (Assuming we don't have separate co-located gateway machine)

2. How does above driver locality work with spark shell running on local
machine ?

3. I am little confused with role of driver program. Does driver do any
computation in spark app life cycle? For instance, in simple row count app,
worker node calculates local row counts. Does driver sum up local row
counts? In short where does reduce phase runs in this case?

4. In case of accessing hdfs data over network, do worker nodes read data
in parallel? How does hdfs data over network get accessed in spark
application?

Sorry if these questions were already discussed..

Thanks
Swapnil

Mime
View raw message