spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Darabos <daniel.dara...@lynxanalytics.com>
Subject Re: Quick one on evaluation
Date Fri, 04 Aug 2017 15:19:47 GMT
On Fri, Aug 4, 2017 at 4:36 PM, Jean Georges Perrin <jgp@jgp.net> wrote:

> Thanks Daniel,
>
> I like your answer for #1. It makes sense.
>
> However, I don't get why you say that there are always pending
> transformations... After you call an action, you should be "clean" from
> pending transformations, no?
>

Nope. Say you have val df = spark.read.csv("data.csv"); println(df.count +
df.count). The first "df.count" reads in the file and counts the rows. The
action was executed, but "df" still represents the same pending
transformations. The second "df.count" again reads in the file and counts
the rows. Actions do not modify DataFrames/RDDs. (The only exception is
"cache()".)

Mime
View raw message