spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Czech <alexander.cz...@googlemail.com>
Subject Restarting the SparkContext in pyspark
Date Thu, 24 Aug 2017 15:59:55 GMT
I'm running a Jupyter-Spark setup and I want to benchmark my cluster with
different input parameters. To make sure the enivorment stays the same I'm
trying to reset(restart) the SparkContext, here is some code:














*    temp_result_parquet = os.path.normpath('/home/spark_tmp_parquet')    i
= 0     while i < max_i:        i += 1        if
os.path.exists(temp_result_parquet):
shutil.rmtree(temp_result_parquet) # I know I could simply overwrite the
parquet                My_DF = do_something(i)
My_DF.write.parquet(temp_result_parquet)        sc.stop()
time.sleep(10)        sc = pyspark.SparkContext(master='spark://ip:here',
appName='PySparkShell')*

when I do this in the first iteration it runs fine but in the second I get
the following error:







*    Py4JJavaError: An error occurred while calling o1876.parquet.    :
org.apache.spark.SparkException: Job aborted.    [...]    Caused by:
java.lang.IllegalStateException: SparkContext has been shutdown        at
org.apache.spark.SparkContext.runJob(SparkContext.scala:2014)        at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)*
I tried running the code without the SparkContext restart but this results
in memory issues. So to wipe the slate clean before every iteration I'm
trying this. With the weird result that parquet "thinks" SparkContext is
down.

Mime
View raw message