hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bing Li <lbl...@gmail.com>
Subject Re: How to Rank in HBase?
Date Mon, 30 Jan 2012 06:52:57 GMT
Dear Ian,

I appreciate so much for your detailed reply! I will read the book about

Best regards,

On Mon, Jan 30, 2012 at 2:36 PM, Ian Varley <ivarley@salesforce.com> wrote:

> Bing,
> HBase uses an approach to structuring its storage known as "Log Structured
> Merge Trees", which you can learn more about here:
> http://scholar.google.com/scholar?q=log+structured+merge+tree&hl=en&as_sdt=0&as_vis=1&oi=scholart
> As well as in Lars George's great book, here:
> http://shop.oreilly.com/product/0636920014348.do
> It does all of these "frequent updates" just in memory, which is very
> fast; at the same time, it writes a simple forward-only log of all edits
> (known as the Write Ahead Log, or WAL) to disk in order to provide
> durability in the event of machine failure. It periodically writes the
> in-memory data to disk in big immutable ordered chunks, called "store
> files", which is very efficient. Future reads of the data then "merge" the
> on-disk store file data with the current state in memory, to get the full
> picture of the state of any row. Over time, the many small store files get
> "compacted" into bigger files, so that individual reads don't have too many
> files to read from. Each "get" or "scan" operation can just read small
> blocks of the store files; when you ask for one record, it doesn't have to
> read gigabytes of data from the disk, it can just read a small block. As
> such, random small reads and writes on a very big data set can be done
> efficiently.
> Furthermore, it's fine to update the data store frequently. For any given
> record, you can make as many updates as you want to the in-memory
> structures, and these aren't written to disk until the memory store is
> flushed (and into the WAL, but that's also efficient b/c it's ordered by
> update time, not record key). It all happens in memory, which is very fast
> (but, again, it's safe b/c of the WAL). There are even some recent JIRAs
> that make that process more efficient, by, for example, HBASE-4241<
> https://issues.apache.org/jira/browse/HBASE-4241>.
> One way to think about it is that HBase is *precisely* a layer that adds
> these efficient random read/write capabilities on top of the Hadoop
> distributed file system (HDFS), and takes care of doing that in a way that
> parallelizes nicely across a large cluster of machines, deals with machine
> failures, etc.
> Ian
> On Jan 29, 2012, at 10:16 PM, Bing Li wrote:
> Dear Stack,
> Thanks so much for your reply!
> According to my understanding, in a large scale distributed system, it
> prefers write-once-read-many. Frequent-updating must bring heavy load for
> the consistency issue and the performance must be lowered. HBase must not
> be suitable to be updated frequently, right?
> Best regards,
> Bing
> On Mon, Jan 30, 2012 at 1:51 PM, Stack <stack@duboce.net<mailto:
> stack@duboce.net>> wrote:
> On Sun, Jan 29, 2012 at 12:02 PM, Bing Li <lblabs@gmail.com<mailto:
> lblabs@gmail.com>> wrote:
> Another question is whether it is proper to update data in HBase
> frequently?
> This is 'normal', yes.
> St.Ack

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message