spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject converting hBaseRDD to DataFrame
Date Mon, 10 Oct 2016 18:02:43 GMT

I am trying to do some operation on an Hbase table that is being populated
by Spark Streaming.

Now this is just Spark on Hbase as opposed to Spark on Hive -> view on
Hbase etc. I also have Phoenix view on this Hbase table.

This is sample code

scala>     val tableName = "marketDataHbaseTest"
>     val conf = HBaseConfiguration.create()
conf: org.apache.hadoop.conf.Configuration = Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
hbase-default.xml, hbase-site.xml
scala>     conf.set(TableInputFormat.INPUT_TABLE, tableName)
scala>         //create rdd
*val hBaseRDD = sc.newAPIHadoopRDD(conf,
org.apache.hadoop.hbase.client.Result)] = NewHadoopRDD[4] at
newAPIHadoopRDD at <console>:64
scala> hBaseRDD.count
res11: Long = 22272

Now that I have hBaseRDD ,is there anyway I can create a DF on it? I
understand that it is not as simple as doing toDF on RDD

scala>  hBaseRDD.toDF
java.lang.AssertionError: assertion failed: no symbol could be loaded from
interface org.apache.hadoop.hbase.classification.InterfaceAudience$Public
in object InterfaceAudience with name Public and classloader


Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

View raw message