hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Slava Gorelik <slava.gore...@gmail.com>
Subject Re: Few questions
Date Thu, 05 Feb 2009 19:33:29 GMT
Thank You for a quick response.So, you wrote:

HBase is efficient at retrieving rows in a range between rows are sorted in
lexicographical order.

My question is it still efficient when rows are within the range but in
the different map files (Like in the case of row update) ?
And another question: map file is it lexicographically sorted ? There no
sort of data between map files on the same region, is it correct ?


Best Regards.
Slava.


On Thu, Feb 5, 2009 at 8:20 PM, Jonathan Gray <jlist@streamy.com> wrote:

> Answers inline.
>
> > -----Original Message-----
> > From: Slava Gorelik [mailto:slava.gorelik@gmail.com]
> > Sent: Thursday, February 05, 2009 9:21 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: Few questions
> >
> > Hi to All.
> >
> > I have a few questions to ask:
> >
> > 1) Is it possible to bring specific columns from the same row within 1
> > round
> > trip (some method that takes list of column names and return rowresult)
> > ?
>
>
> http://hadoop.apache.org/hbase/docs/r0.19.0/api/org/apache/hadoop/hbase/clie
> nt/HTable.html#getRow(byte[],%20byte[][])
>
> HTable.getRow(byte [] row, byte [][] columns)
>
> Ex: byte [][] columns = {"family:column1".getBytes(),
> "family:column2".getBytes()};
>
>
> > 2) Is key size has any implications on HBase performance?
>
> There are some implications but as far as I know nothing that significant.
> Most users have keys on order of 10s or 100s of bytes and I've never seen a
> large difference between them.  Of course, the smaller the key the smaller
> the payload to store and transfer.
>
>
> > 3) Somewhere, i don't remember where, I read that HBase know very fast
> > and
> > efficient to retrieve rows in the range between 2 given keys, is it
> > correct
> > ?
> >    If yes, how it's implemented ? I suggest that data in mapfile is
> > sorted
> > by key (when i inserted the rows), but what happened when i updated
> > the specific row, i guess because in
> >    HBase everything is insert , it means that updated row will be
> > stored
> > (probably) in different map file than original row, is it correct ? If
> > yes,
> > how can be promised efficient and fast
> >    retrieval of rows in the range between 2 keys, in this case it could
> > be
> > retrieval of rows from different map files.
>
>
> HBase is efficient at retrieving rows in a range between rows are sorted in
> lexicographical order.
>
> Check out the HBase architecture wiki page section on HRegionServer
> (http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion).
>
> Writes in HBase are first stored into an in-memory structure called
> Memcache.  This is periodically flushed to an HDFS Mapfile.  A single
> region
> in HBase is made up of one Memcache and 0 to N mapfiles.
>
> So a scanner in HBase is really the merge of a number of scanners.  One
> open
> to the Memcache (recent writes), and one open to each flushed out Mapfile.
>
>
> Hope that helps.
>
> JG
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message