Hey Jörn,

The "pending" was more something like a flag like myDf.hasCatalystWorkToDo() or myDf.isPendingActions(). Maybe an access to the DAG?

I just did that:
    ordersDf = ordersDf.withColumn(
        "time_to_ship"
        datediff(ordersDf.col("ship_date"), ordersDf.col("order_date")));
    
    ordersDf.printSchema();
    ordersDf.show();

and the schema and data is correct, so I was wondering what triggered Catalyst...

jg



On Aug 2, 2017, at 8:29 AM, Jörn Franke <jornfranke@gmail.com> wrote:

I assume printschema would not trigger an evaluation. Show might partially triggger an evaluation (not all data is shown only a certain number of rows by default).
Keep in mind that even a count might not trigger evaluation of all rows (especially in the future) due to updates on the optimizer.

What do you mean by pending ? You can see the status of the job in the UI.

On 2. Aug 2017, at 14:16, Jean Georges Perrin <jgp@jgp.net> wrote:

Hi Sparkians,

I understand the lazy evaluation mechanism with transformations and actions. My question is simpler: 1) are show() and/or printSchema() actions? I would assume so...

and optional question: 2) is there a way to know if there are transformations "pending"?

Thanks!

jg


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org