spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: Understanding what happens when a job is submitted to a cluster
Date Thu, 13 May 2021 14:54:10 GMT

With the danger of stating the obvious, With Spark you are dealing with
what is known as parallel architecture. Parallel architecture comes into
play when the data size is significantly large which cannot be handled on a
single machine, hence hence the use of Spark becomes meaningful. In cases
where (the generated) data size is going to be very large (which is often
the norm rather than the exception these days), the data cannot be
processed and stored in Python tools like Pandas dataframes as these
dataframes store data in RAM. Then, the whole dataset from a storage like
HDFS or cloud storage cannot be collected, because it will take a
significant time and space and probably won't fit in a single machine RAM.
So the key with Spark is *distributed **parallel processing.*

Therefore, I suggest you start from here, assuming that you are somehow
familiar with the fundamentals of Spark.

Running Spark on YARN - Spark 3.1.1 Documentation (


   view my Linkedin profile

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Thu, 13 May 2021 at 15:21, <> wrote:

> Hello,
>        What happens when a job is submitted to a cluster? I know the 10,000
> foot overview of the spark architecture. But I need the minute details as
> to
> how spark estimates the resources to ask yarn, what's the response of yarn
> etc... I need the *step by step* understanding of the complete process. I
> searched through the net but I couldn't find any good material on this. Can
> anyone help me here?
> Thanks,
> Abhilash
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message