spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Markham <mmark...@precisionlender.com>
Subject RE: spark-submit exit status on k8s
Date Mon, 06 Apr 2020 14:51:45 GMT
This is a great idea Masood. We are actually managing our spark jobs with a kubernetes pod
operator, we may stick something in at that layer to determine success/failure so that we
are in the same node of the DAG.

Thanks again.


  *   Marshall

From: Masood Krohy <masood.krohy@analytical.works>
Sent: Sunday, April 5, 2020 11:25 AM
To: Marshall Markham <mmarkham@precisionlender.com>; user <user@spark.apache.org>
Subject: Re: spark-submit exit status on k8s


Another, simpler solution that I just thought of: just add an operation at the end of your
Spark program to write an empty file somewhere, with filename SUCCESS for example. Add a stage
to your AirFlow graph to check the existence of this file after running spark-submit. If the
file is absent, then the Spark app must have failed.

The above should work if you want to avoid dealing with the REST API for monitoring.

Masood

__________________



Masood Krohy, Ph.D.

Data Science Advisor|Platform Architect

https://www.analytical.works<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.analytical.works%2F&data=02%7C01%7Cmmarkham%40precisionlender.com%7Cad56ab9a471e42a5430b08d7d9757cc0%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C0%7C637216970971153433&sdata=rdbgFrc1oZAIr06NPud8HkQyXulcPaNxyvRgbu4iPfI%3D&reserved=0>
On 4/4/20 10:54 AM, Masood Krohy wrote:

I'm not in the Spark dev team, so cannot tell you why that priority was chosen for the JIRA
issue or if anyone is about to finish the work on that; I'll let others jump in if they know.

Just wanted to offer a potential solution so that you can move ahead in the meantime.

Masood

__________________



Masood Krohy, Ph.D.

Data Science Advisor|Platform Architect

https://www.analytical.works<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.analytical.works%2F&data=02%7C01%7Cmmarkham%40precisionlender.com%7Cad56ab9a471e42a5430b08d7d9757cc0%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C0%7C637216970971153433&sdata=rdbgFrc1oZAIr06NPud8HkQyXulcPaNxyvRgbu4iPfI%3D&reserved=0>
On 4/4/20 7:49 AM, Marshall Markham wrote:
Thank you very much Masood for your fast response. Last question, is the current status in
Jira representative of the status of the ticket within the project team? This seems like a
big deal for the K8s implementation and we were surprised to find it marked as priority low.
Is there any discussion of picking up this work in the near future?

Thanks,
Marshall

From: Masood Krohy <masood.krohy@analytical.works><mailto:masood.krohy@analytical.works>
Sent: Friday, April 3, 2020 9:34 PM
To: Marshall Markham <mmarkham@precisionlender.com><mailto:mmarkham@precisionlender.com>;
user <user@spark.apache.org><mailto:user@spark.apache.org>
Subject: Re: spark-submit exit status on k8s


While you wait for a fix on that JIRA ticket, you may be able to add an intermediary step
in your AirFlow graph, calling Spark's REST API after submitting the job, and dig into the
actual status of the application, and make a success/fail decision accordingly. You can make
repeated calls in a loop to the REST API with few seconds delay between each call while the
execution is in progress until the application fails or succeeds.

https://spark.apache.org/docs/latest/monitoring.html#rest-api<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fmonitoring.html%23rest-api&data=02%7C01%7Cmmarkham%40precisionlender.com%7Cad56ab9a471e42a5430b08d7d9757cc0%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C0%7C637216970971163388&sdata=8xVi1S4hLRyGbyUk%2FYWjnZKqx%2FyZ3jujo%2Fx7%2FSYGQAg%3D&reserved=0>

Hope this helps.

Masood

__________________



Masood Krohy, Ph.D.

Data Science Advisor|Platform Architect

https://www.analytical.works<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.analytical.works%2F&data=02%7C01%7Cmmarkham%40precisionlender.com%7Cad56ab9a471e42a5430b08d7d9757cc0%7Cf06d459bd9354ad7a9d3a82343c4c9da%7C0%7C0%7C637216970971163388&sdata=zuMgdEv0czlQFGlhB%2BeyoPMbchq7tbucyRlQ5fWgduI%3D&reserved=0>
On 4/3/20 8:23 AM, Marshall Markham wrote:
Hi Team,

My team recently conducted a POC of Kubernetes/Airflow/Spark with great success. The major
concern we have about this system, after the completion of our POC is a behavior of spark-submit.
When called with a Kubernetes API endpoint as master spark-submit seems to always return exit
status 0. This is obviously a major issue preventing us from conditioning job graphs on the
success or failure of our Spark jobs. I found Jira ticket SPARK-27697 under the Apache issues
covering this bug. The ticket is listed as minor and does not seem to have any activity recently.
I would like to up vote it and ask if there is anything I can do to move this forward. This
could be the one thing standing between my team and our preferred batch workload implementation.
Thank you.

Marshall Markham
Data Engineer
PrecisionLender, a Q2 Company

NOTE: This communication and any attachments are for the sole use of the intended recipient(s)
and may contain confidential and/or privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended recipient, please contact
the sender by replying to this email, and destroy all copies of the original message.
NOTE: This communication and any attachments are for the sole use of the intended recipient(s)
and may contain confidential and/or privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended recipient, please contact
the sender by replying to this email, and destroy all copies of the original message.
NOTE: This communication and any attachments are for the sole use of the intended recipient(s)
and may contain confidential and/or privileged information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended recipient, please contact
the sender by replying to this email, and destroy all copies of the original message.

Mime
View raw message