hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Newbie user questions
Date Mon, 03 Mar 2008 18:47:43 GMT
I did not mean to imply that there should be no scripting language
to interact with HBase. Bigtable has Sawzall and it should be possible
to connect Pig (which is a lot like Sawzall) (or something like it)
to HBase.

My point was that SQL or SQL-like languages are an inappropriate
language to interact with HBase.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Jean-Daniel Cryans [mailto:jdcryans@gmail.com]
> Sent: Monday, March 03, 2008 10:36 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Newbie user questions
>
> Jim, just to be sure I really understand what you mean, your
> point is that HBase being a "sparse, distributed, persistent
> multidimensional sorted map"
> like Bigtable means that it should be used like any other Map
> in the Java API and therefore there shouldn't be any script
> language to interact with it. Am I right?
>
> Thanks,
>
> J-D
>
> 2008/3/3, Jim Kellerman <jim@powerset.com>:
> >
> > -1
> >
> > HBase is a Bigtable clone not a relational database or a column
> > oriented database like cstore.
> >
> > ---
> > Jim Kellerman, Senior Engineer; Powerset
> >
> >
> >
> > > -----Original Message-----
> > > From: edward yoon [mailto:edward@udanax.org]
> > > Sent: Sunday, March 02, 2008 10:22 PM
> > > To: hbase-user@hadoop.apache.org
> >
> > > Subject: Re: Newbie user questions
> > >
> >
> > > I think unified API design and easy guidance are needed.
> > > Therefore, i think hbase default client APIs should be mapping to
> > > HQL client api.
> > >
> > > I would like to get an objective opinion.
> > >
> > > Thanks,
> > > Edward.
> > >
> > > On 3/3/08, Bryan Duxbury <bryan@rapleaf.com> wrote:
> > > > Alex,
> > > >
> > > > The HBase shell is meant only to be used for administrative
> > > purposes,
> > > > like managing tables. You can do limited CRUD operations,
> > > but they're
> > > > mostly there for the benefit of initial testing and
> tracking down
> > > > bugs. HQL is also not SQL, so you shouldn't anticipate
> there being
> > > > many SQL features.
> > > >
> > > > In the Java, REST and Thrift APIs for HBase, there are
> two types
> > > > of accesses - single-row gets and multi-row scans.
> There are a lot
> > > > of options surrounding gets, so there's probably something
> > > that matches
> > > > your needs, but you have to know the row key to start with.
> > > Scans are
> > > > used whenever you need to operate on a number of rows.
> The cursor
> > > > model is indeed the closest analogy for a scanner.
> > > >
> > > > If you need to do a join in the traditional sense, then
> > > yes, you need
> > > > to have at least two scanners and do the joining yourself.
> > > However, if
> > > > possible, you might want to consider denormalizing the data
> > > from the
> > > > two tables you'd be joining into a single table. I don't
> > > mean one row
> > > > per <table1,table2> tuple - HBase supports an arbitrary
> number of
> > > > columns per row, so if your second table is really a
> subordinate
> > > > entity, you might get some benefit from moving all to one table.
> > > >
> > > > The return values for scanners are Java Maps containing
> your data
> > > > (assuming you're in the Java API). Does that answer
> your question?
> > > >
> > > > -Bryan
> > > >
> > > > On Mar 2, 2008, at 7:01 PM, alexthompson@sitelabs.com wrote:
> > > >
> > > > >
> > > > > Newbie user questions. Can you correct me if I am wrong in my
> > > > > following statements:
> > > > >
> > > > > I have looked into querying against hBase and come up
> with a few
> > > > > paths to do this, from the hBase shell I can use HQL,
> > > from code I am
> > > > > limited to scanners which are roughly analogous to cursors, I
> > > > > 'obtain' a scanner and iterate over a table starting at a
> > > row, and
> > > > > once I have a row I can test values in columns.
> > > > >
> > > > > Thus for a 'SQL' type join I can fire up 2+ scanners on
> > > > > different tables and iterate over both testing as I go -
> > > > > performance problems?, is there a more efficient way
> to do this
> > > > > or
> > > are scanners
> > > > > innately efficient?
> > > > >
> > > > > One other thing I can't see is the return value for a
> query, do
> > > > > I build my own collection and hand it back to my calling
> > > methods - or
> > > > > do we have some helper collection objects ( I noticed
> > > 'formatter')
> > > > > to do this.
> > > > >
> > > > > Cheers, Alex. Any help much appreciated.
> > > >
> > > >
> > >
> > >
> > > --
> > > B. Regards,
> > > Edward yoon @ NHN, corp.
> > >
> >
> > > No virus found in this incoming message.
> > > Checked by AVG Free Edition.
> > > Version: 7.5.516 / Virus Database: 269.21.3/1307 - Release
> > > Date: 3/2/2008 3:59 PM
> > >
> > >
> >
> > No virus found in this outgoing message.
> > Checked by AVG Free Edition.
> > Version: 7.5.516 / Virus Database: 269.21.3/1308 - Release Date:
> > 3/3/2008
> > 10:01 AM
> >
> >
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.21.3/1308 - Release
> Date: 3/3/2008 10:01 AM
>
>

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.21.3/1308 - Release Date: 3/3/2008 10:01 AM


Mime
View raw message