spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Georges Perrin <...@jgp.net>
Subject Re: Quick one on evaluation
Date Wed, 02 Aug 2017 13:09:53 GMT
Hey Jörn,

The "pending" was more something like a flag like myDf.hasCatalystWorkToDo() or myDf.isPendingActions().
Maybe an access to the DAG?

I just did that:
    ordersDf = ordersDf.withColumn(
        "time_to_ship", 
        datediff(ordersDf.col("ship_date"), ordersDf.col("order_date")));
    
    ordersDf.printSchema();
    ordersDf.show();

and the schema and data is correct, so I was wondering what triggered Catalyst...

jg



> On Aug 2, 2017, at 8:29 AM, Jörn Franke <jornfranke@gmail.com> wrote:
> 
> I assume printschema would not trigger an evaluation. Show might partially triggger an
evaluation (not all data is shown only a certain number of rows by default).
> Keep in mind that even a count might not trigger evaluation of all rows (especially in
the future) due to updates on the optimizer. 
> 
> What do you mean by pending ? You can see the status of the job in the UI. 
> 
>> On 2. Aug 2017, at 14:16, Jean Georges Perrin <jgp@jgp.net> wrote:
>> 
>> Hi Sparkians,
>> 
>> I understand the lazy evaluation mechanism with transformations and actions. My question
is simpler: 1) are show() and/or printSchema() actions? I would assume so...
>> 
>> and optional question: 2) is there a way to know if there are transformations "pending"?
>> 
>> Thanks!
>> 
>> jg
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


Mime
View raw message