spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giuseppe Sarno <>
Subject Scaling spark jobs returning large amount of data
Date Thu, 04 Jun 2015 14:30:09 GMT
I am relatively new to spark and I am currently trying to understand how to scale large numbers
of jobs with spark.
I understand that spark architecture is split in "Driver", "Master" and "Workers". Master
has a standby node in case of failure and workers can scale out.
All the examples I have seen show Spark been able to distribute the load to the workers and
returning small amount of data to the Driver. In my case I would like to explore the scenario
where I need to generate a large report on data stored on Cassandra and understand how Spark
architecture will handle this case when multiple report jobs will be running in parallel.
According to this  presentation
responses from workers go through the Master and finally to the Driver. Does this mean that
the Driver and/ or Master is a single point for all the responses coming back from workers
Is it possible to start multiple concurrent Drivers ?


Fair Isaac Services Limited (Co. No. 01998476) and Fair Isaac (Adeptra) Limited (Co. No. 03295455)
are registered in England and Wales and have a registered office address of Cottons Centre,
5th Floor, Hays Lane, London, SE1 2QP.

This email and any files transmitted with it are confidential, proprietary and intended solely
for the individual or entity to whom they are addressed. If you have received this email in
error please delete it immediately.

View raw message