spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: [Spark Core] Is it possible to insert a function directly into the Logical Plan?
Date Mon, 14 Aug 2017 19:11:02 GMT
What about accumulators ?

> On 14. Aug 2017, at 20:15, Lukas Bradley <lukasbradley@gmail.com> wrote:
> 
> We have had issues with gathering status on long running jobs.  We have attempted to
draw parallels between the Spark UI/Monitoring API and our code base.  Due to the separation
between code and the execution plan, even having a guess as to where we are in the process
is difficult.  The Job/Stage/Task information is too abstracted from our code to be easily
digested by non Spark engineers on our team.
> 
> Is there a "hook" to which I can attach a piece of code that is triggered when a point
in the plan is reached?  This could be when a SQL command completes, or when a new DataSet
is created, anything really...  
> 
> It seems Dataset.checkpoint() offers an excellent snapshot position during execution,
but I'm concerned I'm short-circuiting the optimal execution of the full plan.  I really want
these trigger functions to be completely independent of the actual processing itself.  I'm
not looking to extract information from a Dataset, RDD, or anything else.  I essentially want
to write independent output for status.  
> 
> If this doesn't exist, is there any desire on the dev team for me to investigate this
feature?
> 
> Thank you for any and all help.

Mime
View raw message