spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yu Wei <yu20...@hotmail.com>
Subject Re: Is it good choice to use DAO to store results generated by spark application?
Date Wed, 20 Jul 2016 14:42:48 GMT
This is startup project. We don't know how much data will be written everyday.

Definitely, there is not too much data at the beginning. But data will increase later.

And we want to use spark streaming to receive data via MQTT Util.

We're now evaluate which components could be used for storing data. We need to extend spark
application to query and analysis data later.


Thx,

Jared

________________________________
From: Ted Yu <yuzhihong@gmail.com>
Sent: Wednesday, July 20, 2016 10:34:15 PM
To: Yu Wei
Cc: ayan guha; Rabin Banerjee; user; Deepak Sharma
Subject: Re: Is it good choice to use DAO to store results generated by spark application?

You can decide which component(s) to use for storing your data.
If you haven't used hbase before, it may be better to store data on hdfs and query through
Hive or SparkSQL.

Maintaining hbase is not trivial task, especially when the cluster size is large.

How much data are you expecting to be written on a daily / weekly basis ?

Cheers

On Wed, Jul 20, 2016 at 7:22 AM, Yu Wei <yu2003w@hotmail.com<mailto:yu2003w@hotmail.com>>
wrote:

I'm beginner to big data. I don't have too much knowledge about hbase/hive.

What's the difference between hbase and hive/hdfs for storing data for analytics?


Thanks,

Jared

________________________________
From: ayan guha <guha.ayan@gmail.com<mailto:guha.ayan@gmail.com>>
Sent: Wednesday, July 20, 2016 9:34:24 PM
To: Rabin Banerjee
Cc: user; Yu Wei; Deepak Sharma

Subject: Re: Is it good choice to use DAO to store results generated by spark application?


Just as a rain check, saving data to hbase for analytics may not be the best choice. Any specific
reason for not using hdfs or hive?

On 20 Jul 2016 20:57, "Rabin Banerjee" <dev.rabin.banerjee@gmail.com<mailto:dev.rabin.banerjee@gmail.com>>
wrote:
Hi Wei ,

You can do something like this ,


foreachPartition( (part) => {
    val conn = ConnectionFactory.createConnection(HBaseConfiguration.create());
    val table = conn.getTable(TableName.valueOf(tablename));
    //part.foreach((inp)=>{println(inp);table.put(inp)}) //This is line by line put
        table.put(part.toList.asJava)
    table.close();
    conn.close();


\

Now if you want to wrap it inside a DAO,its upto you. Making DAO will abstract thing , but
ultimately going to use the same code .

Note: Always use Hbase ConnectionFactory to get connection ,and dump data per partition basis.

Regards,
Rabin Banerjee


On Wed, Jul 20, 2016 at 12:06 PM, Yu Wei <yu2003w@hotmail.com<mailto:yu2003w@hotmail.com>>
wrote:

I need to write all data received from MQTT data into hbase for further processing.

They're not final result.  I also need to read the data from hbase for analysis.


Is it good choice to use DAO in such situation?


Thx,

Jared


________________________________
From: Deepak Sharma <deepakmca05@gmail.com<mailto:deepakmca05@gmail.com>>
Sent: Wednesday, July 20, 2016 12:34:07 PM
To: Yu Wei
Cc: spark users
Subject: Re: Is it good choice to use DAO to store results generated by spark application?


I am using DAO in spark application to write the final computation to Cassandra  and it performs
well.
What kinds of issues you foresee using DAO for hbase ?

Thanks
Deepak

On 19 Jul 2016 10:04 pm, "Yu Wei" <yu2003w@hotmail.com<mailto:yu2003w@hotmail.com>>
wrote:

Hi guys,


I write spark application and want to store results generated by spark application to hbase.

Do I need to access hbase via java api directly?

Or is it better choice to use DAO similar as traditional RDBMS?  I suspect that there is major
performance downgrade and other negative impacts using DAO. However, I have little knowledge
in this field.


Any advice?


Thanks,

Jared





Mime
View raw message