hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Kay <kaykay.uni...@gmail.com>
Subject Re: Hbase as Map/Reduce source
Date Fri, 29 Jan 2010 03:05:50 GMT
HDFS is a double-edged sword . Being a raw file system - you can feed it 
to a Map Reduce program although it might be necessary to define 
InputSplit-s as appropriate to chop down the input size.

OTOH, HBase is structured data ( well - sort of ! ) using a file format 
on top of HDFS to store the schema and hence comes with predefined 
InputSplit-s that make it easy to get started on a MapReduce program.
 From an API simplicity point of view - HBase can get you started 
relatively faster because of it ( assuming you have your data in hbase).

Refer to -
http://wiki.apache.org/hadoop/Hbase/MapReduce .

Although the wiki says deprecated - in reality - it is suggested to 
stick with  *.mapred.* packages for some time since the underlying 
.mapreduce.* packages are not mature enough at this point.

The decision is to entirely do with - the kind of the data you have and 
identifying the data by a primary key amenable to your application, 
which is all hbase in its rudimentary form needs.

On the other hand - if having a schema and defining a primary key for 
your data seems non-orthogonal for your app - you can stick with HDFS 
and a custom InputSplit depending on your data.  Especially since HBase 
provides a lot more than HDFS in terms of scanning / row id ordering and 
if these features are not necessary for what you do - then storing data 
in HDFS should be just about ok.

On 1/28/10 6:20 PM, Otis Gospodnetic wrote:
> I asked a similar question recently:
> http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat
> Otis
> ----- Original Message ----
>> From: "y_823910@tsmc.com"<y_823910@tsmc.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, January 28, 2010 8:02:39 PM
>> Subject: Hbase as Map/Reduce source
>> Hi,
>> I want to understand clearly about Hbase as Map/Reduce source.
>> Basicly, if a table with 100 regions, it means 100 map will be started,
>> right?
>> What's the difference between hdfs and hbase as a Map/Reduce source?
>> Thanks
>> Fleming Chiu(邱宏明)
>> 707-6128
>> y_823910@tsmc.com
>> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>> ---------------------------------------------------------------------------
>>                                                           TSMC PROPERTY
>> This email communication (and any attachments) is proprietary information
>> for the sole use of its
>> intended recipient. Any unauthorized review, use or distribution by anyone
>> other than the intended
>> recipient is strictly prohibited.  If you are not the intended recipient,
>> please notify the sender by
>> replying to this email, and then delete this email and any copies of it
>> immediately. Thank you.
>> ---------------------------------------------------------------------------

View raw message