hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: Any successful story of an HBasecell for 'analytics job' plus 'realtime serving'?
Date Sat, 03 Jul 2010 23:53:40 GMT
Hi Sean,

I am writing an interface for Chukwa to inject data directly into
hbase and relay on hbase to index my data by time group/row key.  It
is working fine for me.  I could tap into the realtime data sink table
to monitor the data arrival and create simple visualization.  The only
minor problem is by default the cell has return the most recent three
revisions back to me instead of 60 versions that I put into the
system.  I am sure it's something simple that I missed.

The next step is to use TableInput and TableOutput for mapreduce to
process analytic computation for my large time series trends.  From
what I gather from hbase javadoc, it looks very promising and simple
to implement.  With hbase manages the file structures, indexing, and
roll up of files, it is bring chukwa one step closer to become a real
time monitoring and reporting application for hadoop.  Being a silent
observer on hbase, I waited 2 years for big table like storage for
hadoop ecosystem, and hbase is the closest in obtaining this goal.

Running mapreduce job on hbase is unlikely to be a real time system,
since there is a lot of bytes transferring between mapreduce and
hbase.  However, if you only need to have near real time experience,
like running mapreduce job every 5-30 minutes.  Then it is certainly
in the realm of possibility.


On Sat, Jul 3, 2010 at 2:42 PM, Sean Bigdatafun
<sean.bigdatafun@gmail.com> wrote:
> I read a thread "Use cases of Hbase" in March archive, and several people
> seemed to suggest that an HBase cell can be used as a mixed cell for data
> crunching and online serving (i.e, using Hive Hbase client to do the
> analytics part while serving live query, see
> http://osdir.com/ml/hbase-user-hadoop-apache/2010-03/msg00299.html), did
> someone really have such successful story? I am a little doubtful about that
> idea.
> Someone else also implied such use case "Since 0.20.0, results of analytic
> computations over the data can be materialized and served out in real time
> in response to queries. This is a complete solution."
> Can someone share the experience on such an option?
> Thanks,
> Sean

View raw message