spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kushagra Deep <>
Subject Re: Spark as computing engine vs spark cluster
Date Mon, 12 Oct 2020 17:57:35 GMT
Hi Santosh,

Spark is a distributed computation engine . You may ask why distributed ? The answer is when
things are distributed, memory and cores can be increased to process parallely on scale .
Since it is difficult to scale things vertically we scale horizontally.

Thanks And Regards
Kushagra Deep

From: Mich Talebzadeh <>
Date: Monday, 12 October 2020 at 11:23 PM
To: Santosh74 <>
Cc: "user @spark" <>
Subject: Re: Spark as computing engine vs spark cluster

Hi Santosh,

Generally speaking, there are two ways of making a process faster:

1.       Do more intelligent work by creating indexes, cubes etc thus reducing the processing
2.       Throw hardware and memory at it using something like Spark multi-cluster with fully
managed cloud service like Google Dataproc
So the framework is a computational engine (Spark) and the physical realisation is achieved
by creating a Spark cluster (multi nodes/VM hosts) that work in tandem and provide parallel
processing. I suggest that you look at Spark docs  <>


[Image removed by sender.]


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction
of data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed. The author will in no case be liable for any monetary damages arising
from such loss, damage or destruction.

On Sat, 10 Oct 2020 at 15:24, Santosh74 <<>>
Is spark compute engine only or it's also cluster which comes with set of
hardware /nodes  ? What exactly is spark clusterr?

Sent from:

To unsubscribe e-mail:<>
View raw message