You can decide which component(s) to use for storing your data.
If you haven't used hbase before, it may be better to store data on hdfs and query through Hive or SparkSQL.

Maintaining hbase is not trivial task, especially when the cluster size is large.

How much data are you expecting to be written on a daily / weekly basis ?


On Wed, Jul 20, 2016 at 7:22 AM, Yu Wei <> wrote:

I'm beginner to big data. I don't have too much knowledge about hbase/hive.

What's the difference between hbase and hive/hdfs for storing data for analytics?



From: ayan guha <>
Sent: Wednesday, July 20, 2016 9:34:24 PM
To: Rabin Banerjee
Cc: user; Yu Wei; Deepak Sharma

Subject: Re: Is it good choice to use DAO to store results generated by spark application?

Just as a rain check, saving data to hbase for analytics may not be the best choice. Any specific reason for not using hdfs or hive?

On 20 Jul 2016 20:57, "Rabin Banerjee" <> wrote:
Hi Wei ,

You can do something like this ,

foreachPartition( (part) => {
    val conn = ConnectionFactory.createConnection(HBaseConfiguration.create());
    val table = conn.getTable(TableName.valueOf(tablename));
    //part.foreach((inp)=>{println(inp);table.put(inp)}) //This is line by line put


Now if you want to wrap it inside a DAO,its upto you. Making DAO will abstract thing , but ultimately going to use the same code . 

Note: Always use Hbase ConnectionFactory to get connection ,and dump data per partition basis.

Rabin Banerjee

On Wed, Jul 20, 2016 at 12:06 PM, Yu Wei <> wrote:

I need to write all data received from MQTT data into hbase for further processing.

They're not final result.  I also need to read the data from hbase for analysis.

Is it good choice to use DAO in such situation?



From: Deepak Sharma <>
Sent: Wednesday, July 20, 2016 12:34:07 PM
To: Yu Wei
Cc: spark users
Subject: Re: Is it good choice to use DAO to store results generated by spark application?

I am using DAO in spark application to write the final computation to Cassandra  and it performs well.
What kinds of issues you foresee using DAO for hbase ?


On 19 Jul 2016 10:04 pm, "Yu Wei" <> wrote:

Hi guys,

I write spark application and want to store results generated by spark application to hbase.

Do I need to access hbase via java api directly?

Or is it better choice to use DAO similar as traditional RDBMS?  I suspect that there is major performance downgrade and other negative impacts using DAO. However, I have little knowledge in this field.

Any advice?