Hi Dharmin
With the 1st approach , you will have to read the properties from the --files using this below:

Or else , you can copy the file to hdfs , read it using sc.textFile and use the property within it.

If you add files using --files , it gets copied to executor's working directory but you still have to read it and use the properties to be set in conf.

On Fri, Feb 23, 2018 at 10:25 AM, Dharmin Siddesh J <siddeshjdharmin@gmail.com> wrote:

I am trying to write a Spark program that reads data from HBase and store it in DataFrame.

I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf folder, but I am facing few issues here.

Issue 1

The first issue is passing hbase-site.xml location with the --files parameter submitted through client mode (it works in cluster mode).

When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it in client mode by passing with the --files parameter over YARN I keep getting the an exception (which I think means it is not taking the ZooKeeper configuration from hbase-site.xml.

spark-submit \

  --master yarn \

  --deploy-mode client \

  --files /home/siddesh/hbase-site.xml \

  --class com.orzota.rs.json.HbaseConnector \

  --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \

  --repositories http://repo.hortonworks.com/content/groups/public/ \


    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)

18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

However it works good when I run it in cluster mode.

Issue 2

Passing the HBase configuration details through the Spark session, which I can't get to work in both client and cluster mode.

Instead of passing the entire hbase-site.xml I am trying to add the configuration directly in the code by adding it as a configuration parameter in the SparkSession, e.g.:

val spark = SparkSession



  .config("hbase.zookeeper.property.clientPort", "2181")

  .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")



val json_df =




This is not working in cluster mode either.

Can anyone help me with a solution or explanation why this is happening are there any workarounds?