spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Where is the DAG stored before catalyst gets it?
Date Sat, 06 Oct 2018 17:23:32 GMT
Hi Jean Georges,

> I am assuming it is still in the master and when catalyst is finished it
sends the tasks to the workers.

Sorry to be that direct, but the sentence does not make much sense to me.
Again, very sorry for saying it in the very first sentence. Since I know
Jean Georges I allowed myself for more openness.

In other words, "the master" part seems to suggest that you use Spark
Standalone cluster. Correct? Other cluster use different naming for the
master/manager node.

"when catalyst is finished" that one is really tough to understand. You
mean once all the optimizations are applied and the query is ready for
execution? The final output of the "query execution pipeline" is to
generate a RDD with the right code for execution. At this phase, the query
is more an RDD than a Dataset.

"it sends the tasks to the workers." since we're talking about an RDD, this
abstraction is planned as a set of tasks (one per partition of the RDD).
And yes, the tasks are sent out over the wire to executors. It's been like
this from Spark 1.0 (and even earlier).

Hope I helped a bit.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


On Fri, Oct 5, 2018 at 12:36 AM Jean Georges Perrin <jgp@jgp.net> wrote:

> Hi,
>
> I am assuming it is still in the master and when catalyst is finished it
> sends the tasks to the workers.
>
> Correct?
>
> tia
>
> jg
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message