spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Is it good choice to use DAO to store results generated by spark application?
Date Wed, 20 Jul 2016 14:34:15 GMT
You can decide which component(s) to use for storing your data.
If you haven't used hbase before, it may be better to store data on hdfs
and query through Hive or SparkSQL.

Maintaining hbase is not trivial task, especially when the cluster size is
large.

How much data are you expecting to be written on a daily / weekly basis ?

Cheers

On Wed, Jul 20, 2016 at 7:22 AM, Yu Wei <yu2003w@hotmail.com> wrote:

> I'm beginner to big data. I don't have too much knowledge about hbase/hive.
>
> What's the difference between hbase and hive/hdfs for storing data for
> analytics?
>
>
> Thanks,
>
> Jared
> ------------------------------
> *From:* ayan guha <guha.ayan@gmail.com>
> *Sent:* Wednesday, July 20, 2016 9:34:24 PM
> *To:* Rabin Banerjee
> *Cc:* user; Yu Wei; Deepak Sharma
>
> *Subject:* Re: Is it good choice to use DAO to store results generated by
> spark application?
>
>
> Just as a rain check, saving data to hbase for analytics may not be the
> best choice. Any specific reason for not using hdfs or hive?
> On 20 Jul 2016 20:57, "Rabin Banerjee" <dev.rabin.banerjee@gmail.com>
> wrote:
>
>> Hi Wei ,
>>
>> You can do something like this ,
>>
>> foreachPartition( (part) => {    val conn = ConnectionFactory.createConnection(HBaseConfiguration.create());
   val table = conn.getTable(TableName.valueOf(tablename));    //part.foreach((inp)=>{println(inp);table.put(inp)})
//This is line by line put	table.put(part.toList.asJava)    table.close();    conn.close();
>>
>>
>> \
>>
>> Now if you want to wrap it inside a DAO,its upto you. Making DAO will
>> abstract thing , but ultimately going to use the same code .
>>
>> Note: Always use Hbase ConnectionFactory to get connection ,and dump data
>> per partition basis.
>>
>> Regards,
>> Rabin Banerjee
>>
>>
>> On Wed, Jul 20, 2016 at 12:06 PM, Yu Wei <yu2003w@hotmail.com> wrote:
>>
>>> I need to write all data received from MQTT data into hbase for further
>>> processing.
>>>
>>> They're not final result.  I also need to read the data from hbase for
>>> analysis.
>>>
>>>
>>> Is it good choice to use DAO in such situation?
>>>
>>>
>>> Thx,
>>>
>>> Jared
>>>
>>>
>>> ------------------------------
>>> *From:* Deepak Sharma <deepakmca05@gmail.com>
>>> *Sent:* Wednesday, July 20, 2016 12:34:07 PM
>>> *To:* Yu Wei
>>> *Cc:* spark users
>>> *Subject:* Re: Is it good choice to use DAO to store results generated
>>> by spark application?
>>>
>>>
>>> I am using DAO in spark application to write the final computation to
>>> Cassandra  and it performs well.
>>> What kinds of issues you foresee using DAO for hbase ?
>>>
>>> Thanks
>>> Deepak
>>>
>>> On 19 Jul 2016 10:04 pm, "Yu Wei" <yu2003w@hotmail.com> wrote:
>>>
>>>> Hi guys,
>>>>
>>>>
>>>> I write spark application and want to store results generated by spark
>>>> application to hbase.
>>>>
>>>> Do I need to access hbase via java api directly?
>>>>
>>>> Or is it better choice to use DAO similar as traditional RDBMS?  I
>>>> suspect that there is major performance downgrade and other negative
>>>> impacts using DAO. However, I have little knowledge in this field.
>>>>
>>>>
>>>> Any advice?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Jared
>>>>
>>>>
>>>>
>>>>
>>

Mime
View raw message