hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: Modeling column families
Date Sat, 24 Apr 2010 21:22:42 GMT
24 апреля 2010 г. 23:59 пользователь Ryan Rawson <ryanobjc@gmail.com>написал:

> On Sat, Apr 24, 2010 at 12:22 AM, Andrey Stepachev <octo47@gmail.com>
> wrote:
> > 2010/4/24 Andrew Nguyen <andrew-lists-hbase@ucsfcti.org>
> >
> >> Hello all,
> >>
> >> Each row key is of the form "PatientName-PhysiologicParameter" and each
> >> column name is the timestamp of the reading.
> >>
> >
> > With such design in hbase (in opposite to cassandra) you should use row
> > filters to get only part of data (for example last year) or use client
> > filtering with row scan.
> > If data series will be big (>100) you will run in issue of infra row
> > scanning https://issues.apache.org/jira/browse/HBASE-1537,
> > as I did. Another issue, as mentioned before, is scaling. Hbase splits
> data
> > by rows.
> >
> > Нou have to figure out how much data will be in a row, and if it counts
> to
> > hundreds, use compound key (patient-code-date),
> > If they are small, may be more easy to use will be (patient-code) because
> > you can use Get operations with locks (if you need them), and in case of
> > dated key, you can't (because scan doesn't yet honor locks).
> This statement is happily obsolete - 0.20.4 RC has new code that makes
> it so that Gets and Scans never return partially updated rows. I
> dislike the term 'honor locks' because it implies an implementation
> strategy, and in this case Gets (which are now 1 row scans) and Scans
> do not acquire locks to accomplish their tasks.  This is important
> because if you acquired a row lock (which is exclusive) you would only
> be able to have 1 read and write operation at a time, whereas we
> really want 1 write operation and as many read operations.

No. I mean the scenario, when I want to lock row for writing, but you can't
lock row for all dates
at once. In case of patient-code it is easy. In patienc-code-date you should
use some artifical date
or use zookepeer directly.

Personally, I prefer compound keys, as I mention before. In my message I
point to very
important thing - intrarow scanning in a case of very huge columns. Scan
will faill with OOM
if you try to read such row.

> For example if you are storing timeseries data for a monitoring
> system, you might want to store it by row, since the number of points
> for a single system might be arbitrarily large (think: 2 years+ of
> data). In this case if the expected data set size per row is larger
> than what a single machine could conceivably store, Cassandra would
> not work for you in this case (since each row must be stored on a
> single (er N) node(s)).

Really, In my reply I say only that cassandra has an API for scanning
I understand, t

2010/4/25 Andrew Nguyen <andrew-lists-hbase@ucsfcti.org>

> You mention tall tables - this sounds consistent with what Erik and Andrey
have said.  Given that, just to clarify my
> understanding, I'm probably looking at a single table with only one column
(the value, which Andrey names as "series"???) and
> billiions of rows, right?

Exactly. In case of columns as PhysiologicParameter really bettery solution.
Like series:ABP series:HP etc.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message