spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: [Spark Shell on AWS K8s Cluster]: Is there more documentation regarding how to run spark-shell on k8s cluster?
Date Thu, 15 Nov 2018 14:48:44 GMT
If folks are interested, while it's not on Amazon, I've got a live stream
of getting client mode with Jupyternotebook to work on GCP/GKE :
https://www.youtube.com/watch?v=eMj0Pv1-Nfo&index=3&list=PLRLebp9QyZtZflexn4Yf9xsocrR_aSryx

On Wed, Oct 31, 2018 at 5:55 PM Zhang, Yuqi <Yuqi.Zhang@teradata.com> wrote:

> Hi Li,
>
>
>
> Thank you very much for your reply!
>
>
>
> > Did you make the headless service that reflects the driver pod name?
>
> I am not sure but I used “app” in the headless service as selector which
> is the same app name for the StatefulSet that will create the spark driver
> pod.
>
> For your reference, I attached the yaml file for making headless service
> and StatefulSet. Could you please help me take a look at it if you have
> time?
>
>
>
> I appreciate for your help & have a good day!
>
>
>
> Best Regards,
>
> --
>
> Yuqi Zhang
>
> Software Engineer
>
> m: 090-6725-6573
>
>
> [image: signature_147554612] <http://www.teradata.com/>
>
> 2 Chome-2-23-1 Akasaka
>
> Minato, Tokyo 107-0052
> teradata.com <http://www.teradata.com>
>
> This e-mail is from Teradata Corporation and may contain information that
> is confidential or proprietary. If you are not the intended recipient, do
> not read, copy or distribute the e-mail or any attachments. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
>
> Please consider the environment before printing.
>
>
>
>
>
>
>
> *From: *Li Gao <ligao101@gmail.com>
> *Date: *Thursday, November 1, 2018 4:56
> *To: *"Zhang, Yuqi" <Yuqi.Zhang@Teradata.com>
> *Cc: *Gourav Sengupta <gourav.sengupta@gmail.com>, "user@spark.apache.org"
> <user@spark.apache.org>, "Nogami, Masatsugu"
> <Masatsugu.Nogami@Teradata.com>
> *Subject: *Re: [Spark Shell on AWS K8s Cluster]: Is there more
> documentation regarding how to run spark-shell on k8s cluster?
>
>
>
> Hi Yuqi,
>
>
>
> Yes we are running Jupyter Gateway and kernels on k8s and using Spark
> 2.4's client mode to launch pyspark. In client mode your driver is running
> on the same pod where your kernel runs.
>
>
>
> I am planning to write some blog post on this on some future date. Did you
> make the headless service that reflects the driver pod name? Thats one of
> critical pieces we automated in our custom code that makes the client mode
> works.
>
>
>
> -Li
>
>
>
>
>
> On Wed, Oct 31, 2018 at 8:13 AM Zhang, Yuqi <Yuqi.Zhang@teradata.com>
> wrote:
>
> Hi Li,
>
>
>
> Thank you for your reply.
>
> Do you mean running Jupyter client on k8s cluster to use spark 2.4?
> Actually I am also trying to set up JupyterHub on k8s to use spark, that’s
> why I would like to know how to run spark client mode on k8s cluster. If
> there is any related documentation on how to set up the Jupyter on k8s to
> use spark, could you please share with me?
>
>
>
> Thank you for your help!
>
>
>
> Best Regards,
>
> --
>
> Yuqi Zhang
>
> Software Engineer
>
> m: 090-6725-6573
>
>
> [image: signature_147554612] <http://www.teradata.com/>
>
> 2 Chome-2-23-1 Akasaka
>
> Minato, Tokyo 107-0052
> teradata.com <http://www.teradata.com>
>
> This e-mail is from Teradata Corporation and may contain information that
> is confidential or proprietary. If you are not the intended recipient, do
> not read, copy or distribute the e-mail or any attachments. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
>
> Please consider the environment before printing.
>
>
>
>
>
>
>
> *From: *Li Gao <ligao101@gmail.com>
> *Date: *Thursday, November 1, 2018 0:07
> *To: *"Zhang, Yuqi" <Yuqi.Zhang@Teradata.com>
> *Cc: *"gourav.sengupta@gmail.com" <gourav.sengupta@gmail.com>, "
> user@spark.apache.org" <user@spark.apache.org>, "Nogami, Masatsugu"
> <Masatsugu.Nogami@Teradata.com>
> *Subject: *Re: [Spark Shell on AWS K8s Cluster]: Is there more
> documentation regarding how to run spark-shell on k8s cluster?
>
>
>
> Yuqi,
>
>
>
> Your error seems unrelated to headless service config you need to enable.
> For headless service you need to create a headless service that matches to
> your driver pod name exactly in order for spark 2.4 RC to work under client
> mode. We have this running for a while now using Jupyter kernel as the
> driver client.
>
>
>
> -Li
>
>
>
>
>
> On Wed, Oct 31, 2018 at 7:30 AM Zhang, Yuqi <Yuqi.Zhang@teradata.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Thank you for your reply.
>
>
>
> I haven’t try glue or EMK, but I guess it’s integrating kubernetes on aws
> instances?
>
> I could set up the k8s cluster on AWS, but my problem is don’t know how to
> run spark-shell on kubernetes…
>
> Since spark only support client mode on k8s from 2.4 version which is not
> officially released yet, I would like to ask if there is more detailed
> documentation regarding the way to run spark-shell on k8s cluster?
>
>
>
> Thank you in advance & best regards!
>
>
>
> --
>
> Yuqi Zhang
>
> Software Engineer
>
> m: 090-6725-6573
>
>
> *Error! Filename not specified.* <http://www.teradata.com/>
>
> 2 Chome-2-23-1 Akasaka
>
> Minato, Tokyo 107-0052
> teradata.com <http://www.teradata.com>
>
> This e-mail is from Teradata Corporation and may contain information that
> is confidential or proprietary. If you are not the intended recipient, do
> not read, copy or distribute the e-mail or any attachments. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
>
> Please consider the environment before printing.
>
>
>
>
>
>
>
> *From: *Gourav Sengupta <gourav.sengupta@gmail.com>
> *Date: *Wednesday, October 31, 2018 18:34
> *To: *"Zhang, Yuqi" <Yuqi.Zhang@Teradata.com>
> *Cc: *user <user@spark.apache.org>, "Nogami, Masatsugu"
> <Masatsugu.Nogami@Teradata.com>
> *Subject: *Re: [Spark Shell on AWS K8s Cluster]: Is there more
> documentation regarding how to run spark-shell on k8s cluster?
>
>
>
> [External Email]
> ------------------------------
>
> Just out of curiosity why would you not use Glue (which is Spark on
> kubernetes) or EMR?
>
>
>
> Regards,
>
> Gourav Sengupta
>
>
>
> On Mon, Oct 29, 2018 at 1:29 AM Zhang, Yuqi <Yuqi.Zhang@teradata.com>
> wrote:
>
> Hello guys,
>
>
>
> I am Yuqi from Teradata Tokyo. Sorry to disturb but I have some problem
> regarding using spark 2.4 client mode function on kubernetes cluster, so I
> would like to ask if there is some solution to my problem.
>
>
>
> The problem is when I am trying to run spark-shell on kubernetes v1.11.3
> cluster on AWS environment, I couldn’t successfully run stateful set using
> the docker image built from spark 2.4. The error message is showing below.
> The version I am using is spark v2.4.0-rc3.
>
>
>
> Also, I wonder if there is more documentation on how to use client-mode or
> integrate spark-shell on kubernetes cluster. From the documentation on
> https://github.com/apache/spark/blob/v2.4.0-rc3/docs/running-on-kubernetes.md
> there is only a brief description. I understand it’s not the official
> released version yet, but If there is some more documentation, could you
> please share with me?
>
>
>
> Thank you very much for your help!
>
>
>
>
>
> Error msg:
>
> + env
>
> + sed 's/[^=]*=\(.*\)/\1/g'
>
> + sort -t_ -k4 -n
>
> + grep SPARK_JAVA_OPT_
>
> + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>
> + '[' -n '' ']'
>
> + '[' -n '' ']'
>
> + PYSPARK_ARGS=
>
> + '[' -n '' ']'
>
> + R_ARGS=
>
> + '[' -n '' ']'
>
> + '[' '' == 2 ']'
>
> + '[' '' == 3 ']'
>
> + case "$SPARK_K8S_CMD" in
>
> + CMD=("$SPARK_HOME/bin/spark-submit" --conf
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client
> "$@")
>
> + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf
> spark.driver.bindAddress= --deploy-mode client
>
> Error: Missing application resource.
>
> Usage: spark-submit [options] <app jar | python file | R file> [app
> arguments]
>
> Usage: spark-submit --kill [submission ID] --master [spark://...]
>
> Usage: spark-submit --status [submission ID] --master [spark://...]
>
> Usage: spark-submit run-example [options] example-class [example args]
>
>
>
>
>
> --
>
> Yuqi Zhang
>
> Software Engineer
>
> m: 090-6725-6573
>
>
> *Error! Filename not specified.* <http://www.teradata.com/>
>
> 2 Chome-2-23-1 Akasaka
>
> Minato, Tokyo 107-0052
> teradata.com <http://www.teradata.com>
>
> This e-mail is from Teradata Corporation and may contain information that
> is confidential or proprietary. If you are not the intended recipient, do
> not read, copy or distribute the e-mail or any attachments. Instead, please
> notify the sender and delete the e-mail and any attachments. Thank you.
>
> Please consider the environment before printing.
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org



-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Mime
View raw message