Hi Dharmin
With the 1st approach , you will have to read the properties from the --files using this below:
SparkFiles.get('file.txt')

Or else , you can copy the file to hdfs , read it using sc.textFile and use the property within it.

If you add files using --files , it gets copied to executor's working directory but you still have to read it and use the properties to be set in conf.
Thanks
Deepak

On Fri, Feb 23, 2018 at 10:25 AM, Dharmin Siddesh J <siddeshjdharmin@gmail.com> wrote:

I am trying to write a Spark program that reads data from HBase and store it in DataFrame.

I am able to run it perfectly with hbase-site.xml in the $SPARK_HOME/conf folder, but I am facing few issues here.

Issue 1

The first issue is passing hbase-site.xml location with the --files parameter submitted through client mode (it works in cluster mode).


When I removed hbase-site.xml from $SPARK_HOME/conf and tried to execute it in client mode by passing with the --files parameter over YARN I keep getting the an exception (which I think means it is not taking the ZooKeeper configuration from hbase-site.xml.

spark-submit \

  --master yarn \

  --deploy-mode client \

  --files /home/siddesh/hbase-site.xml \

  --class com.orzota.rs.json.HbaseConnector \

  --packages com.hortonworks:shc:1.0.0-2.0-s_2.11 \

  --repositories http://repo.hortonworks.com/content/groups/public/ \

  target/scala-2.11/test-0.1-SNAPSHOT.jar

    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

18/02/22 01:43:09 INFO ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)

18/02/22 01:43:09 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)

However it works good when I run it in cluster mode.


Issue 2

Passing the HBase configuration details through the Spark session, which I can't get to work in both client and cluster mode.


Instead of passing the entire hbase-site.xml I am trying to add the configuration directly in the code by adding it as a configuration parameter in the SparkSession, e.g.:


val spark = SparkSession

  .builder()

  .appName(name)

  .config("hbase.zookeeper.property.clientPort", "2181")

  .config("hbase.zookeeper.quorum", "ip1,ip2,ip3")

  .config("spark.hbase.host","zookeeperquorum")

  .getOrCreate()


val json_df =

  spark.read.option("catalog",catalog_read).

  format("org.apache.spark.sql.execution.datasources.hbase").

  load()

This is not working in cluster mode either.


Can anyone help me with a solution or explanation why this is happening are there any workarounds?





--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net