spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rhettbutler <>
Subject Pyspark not running the sqlContext in Pycharm
Date Fri, 02 Mar 2018 07:19:19 GMT
I hope someone could help with this problem I am having. I have previously
setup a VM in windows using CENTOS, with hadoop and spark (all in
singlenode) and it was working perfectly.

I am now running a multinode setup with another computer, both running
CENTOS standalone. I have installed hadoop successfully and is running on
both machines. Then I've installed spark with the following setup:

Version : Spark 2.2.1-bin-hadoop2.7, with the .bashrc file as follows:

export SPARK_HOME=/opt/spark/spark-2.2.1-bin-hadoop2.7


export PATH="/home/hadoop/anaconda2/bin:$PATH"

I am using anaconda (python 2.7 version) to install the pyspark packages. I
then have the $SPARK_HOME/conf files setup as follows:

the slaves file as:


(the hostname of the node which i use to conduct the processing on)

and the file:

export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk

export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.8.3/etc/hadoop


The idea is that I then connect the spark to PyCharm IDE to do my work on.
In Pycharm I have setup the environment variables (under run -> edit
configurations) as

PYTHON PATH /opt/spark/spark-2.2.1-bin-hadoop2.7/python/lib

SPARK_HOME /opt/spark/spark-2.2.1-bin-hadoop2.7

An image of the environment variables: 

I have also setup my python interpreter to point to the anaconda python
directory. With all this setup I get multiple errors as output when I call
either a spark SQLContext or SparkSession.Builder, for example:

conf = SparkConf().setMaster("local[*]")

sc = SparkContext(conf=conf)

sql_sc = SQLContext(sc)


spark =

The ERROR given:

File "/home/hadoop/Desktop/PythonPrac/", line 72,
in .config("spark.executor.memory", "2gb") \ File
"/opt/spark/spark-2.2.1-bin-hadoop2.7/python/pyspark/sql/", line
183, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value) File
line 1160, in call answer, self.gateway_client, self.target_id,
File "/opt/spark/spark-2.2.1-bin-hadoop2.7/python/pyspark/sql/",
line 79, in deco raise IllegalArgumentException(s.split(': ', 1)1,
stackTrace) pyspark.sql.utils.IllegalArgumentException: u"Error while
instantiating 'org.apache.spark.sql.internal.SessionStateBuilder':"
Unhandled exception in thread started by > Process finished with exit code 1

An image of the error:

I do not know why this error message is showing, when I was running this in
my VM single node, it was working fine. I then decided in my multinode setup
to remove the datanode1 and just run it again as a singlenode setup with my
main computer (hostname - master), but still getting the same errors.

I hope someone could help, as I have followed other guides to setup pycharm
with pyspark, but could not figure out what is going wrong. Thanks!

Sent from:

To unsubscribe e-mail:

View raw message