spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edwin Biemond (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-27927) driver pod hangs with pyspark 2.4.3 and master on kubenetes
Date Sun, 14 Jul 2019 10:20:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884631#comment-16884631
] 

Edwin Biemond commented on SPARK-27927:
---------------------------------------

This is what I see now,

driver starts up with an executor. python executes fines and the process is removed.  Normally
the driver detects this and cleans itself and executor. Now driver does not detect this because
sparkContext is still there.

I get the same normal behaviour again ( <= 2.4.0)  when add spark.sparkContext.stop()
to the python script

So this
spark = SparkSession.builder.appName('hello_world').getOrCreate()
Start something, > 2.4.0 and drv still  thinks something is running.

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> -----------------------------------------------------------
>
>                 Key: SPARK-27927
>                 URL: https://issues.apache.org/jira/browse/SPARK-27927
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, PySpark
>    Affects Versions: 3.0.0, 2.4.3
>         Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>            Reporter: Edwin Biemond
>            Priority: Major
>         Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs and never
calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We see the output
of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information: <SparkContext master=k8s://https://kubernetes.default.svc:443
appName=hello_world> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message