hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Ad-hoc reports against HBase - any way? any tools?
Date Fri, 25 Feb 2011 21:02:14 GMT
Hello,

I have a HBase cluster chock-full of data and would like to run canned reports 
(i.e., 

reports known ahead of time), but also ad-hoc reports against that data.
Are there any open-source or commercial tools one can use?

Here's what I *think* I know so far, but please correct me wherever I wrong, so 
I don't spread false info:

* Use HBase-Hive Integration
  Pluses:
    - lots of tools to query Hive are available
  Minuses:
    - data duplication
    - Hive's copy of data is always behind
    - I heard the integration is fairly alpha (e.g. you can't copy deltas to 
Hive, you have to copy all data every time you want to update your Hive store)

* Use Pig 
  https://issues.apache.org/jira/browse/PIG-970
  https://issues.apache.org/jira/browse/PIG-1205
  Pluses:
    - runs directly against HBase, no need to copy data
  Minuses:
    - PigLatin learning curve - in my case people wanting ad-hoc reports are not 

techies
    - No pretty front-end with syntax highlighting or visual querying or that 
accepts SQL and translates it to PigLatin

* Use PigPen
  Pluses:
    - Visual == easy
  Minuses:
    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and 
https://issues.apache.org/jira/browse/PIG-366

* Use Toad for Cloud
  Pluses:
    - accepts SQL, runs, and returns data
    - runs directly against HBase, no need to copy data
  Minuses:
    - some people reported it crashes
    - it allows the person querying the data to also modify the data, which is 
bad in my environment

* Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be 
able to get the 

data out of Hive, but not out of HBase.  More info below:

* Pentaho
    * http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
    * http://search-hadoop.com/?q=pentaho&src=moz-search

* Datameer
    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks like 
it 

supports only Hive
    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like 
one 

can add support for HBase by writing a plugin?

Karmasphere Analyst
    * http://www.karmasphere.com/Products-Information/karmasphere-analyst.html - 

Hive only


Is any of the above incorrect?
Did I miss a tool, free or non-free, that I could use to run ad-hoc reports 
against data in HBase?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/


Mime
View raw message