hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@gmail.com>
Subject Re: Newbie user questions
Date Mon, 03 Mar 2008 15:57:53 GMT
I'm not an HBase dev but here is my user opinion.

I think Edward that the HBase client API should be mapping to the HQL API if
and only if HQL is meant to be used in the same way as the default API. As
Bryan was stressing, the shell is 'meant' to be used for admin purposes but
as a user I don't feel the same about you, which I would guess, is probably
because you wrote it with different purposes in mind. So i think it would be
welcome if the HBase shell scope is clearly documented because on the Wiki
we can read 'HBase Shell <http://wiki.apache.org/hadoop/Hbase/HbaseShell>, a
Query Language Shell for Hadoop + HBase' which means more than just admin
for a typical RDBMS developer.

@alex, I would recommend reading the
Bigtable<http://labs.google.com/papers/bigtable.html>paper to
understand how HBase is meant to be used and how to design your
schema. The "infinite number of columns" concept is quite new and worth a
few hours of work to master conceptually. Also don't forget that row keys
are stored in a lexicographic order.

J-D

2008/3/3, edward yoon <edward@udanax.org>:
>
> I think unified API design and easy guidance are needed.
> Therefore, i think hbase default client APIs should be mapping to HQL
> client api.
>
> I would like to get an objective opinion.
>
> Thanks,
> Edward.
>
>
> On 3/3/08, Bryan Duxbury <bryan@rapleaf.com> wrote:
> > Alex,
> >
> > The HBase shell is meant only to be used for administrative purposes,
> > like managing tables. You can do limited CRUD operations, but they're
> > mostly there for the benefit of initial testing and tracking down
> > bugs. HQL is also not SQL, so you shouldn't anticipate there being
> > many SQL features.
> >
> > In the Java, REST and Thrift APIs for HBase, there are two types of
> > accesses - single-row gets and multi-row scans. There are a lot of
> > options surrounding gets, so there's probably something that matches
> > your needs, but you have to know the row key to start with. Scans are
> > used whenever you need to operate on a number of rows. The cursor
> > model is indeed the closest analogy for a scanner.
> >
> > If you need to do a join in the traditional sense, then yes, you need
> > to have at least two scanners and do the joining yourself. However,
> > if possible, you might want to consider denormalizing the data from
> > the two tables you'd be joining into a single table. I don't mean one
> > row per <table1,table2> tuple - HBase supports an arbitrary number of
> > columns per row, so if your second table is really a subordinate
> > entity, you might get some benefit from moving all to one table.
> >
> > The return values for scanners are Java Maps containing your data
> > (assuming you're in the Java API). Does that answer your question?
> >
> > -Bryan
> >
> > On Mar 2, 2008, at 7:01 PM, alexthompson@sitelabs.com wrote:
> >
> > >
> > > Newbie user questions. Can you correct me if I am wrong in my
> > > following statements:
> > >
> > > I have looked into querying against hBase and come up with a few
> > > paths to do this, from the hBase shell I can use HQL, from code I
> > > am limited to scanners which are roughly analogous to cursors, I
> > > 'obtain' a scanner and iterate over a table starting at a row, and
> > > once I have a row I can test values in columns.
> > >
> > > Thus for a 'SQL' type join I can fire up 2+ scanners on different
> > > tables and iterate over both testing as I go - performance
> > > problems?, is there a more efficient way to do this or are scanners
> > > innately efficient?
> > >
> > > One other thing I can't see is the return value for a query, do I
> > > build my own collection and hand it back to my calling methods - or
> > > do we have some helper collection objects ( I noticed 'formatter')
> > > to do this.
> > >
> > > Cheers, Alex. Any help much appreciated.
> >
> >
>
>
>
> --
> B. Regards,
> Edward yoon @ NHN, corp.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message