hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitin Gupta" <nitingupta...@gmail.com>
Subject RE: Help needed - Adding HBase to architecture
Date Mon, 08 Jun 2009 05:43:56 GMT

Thanks for detailed explanation. Much helpful.

As far as file size is concerned, we may be even required to save Videos in
future. So we shall def go above the HBase size limit at some point in time.
Any other solution or key-value database that you can recommend for our

I am not much knowledgeable about the HDFS either. I think if we go with
pure HDFS, then all the required DB operations would have to be custom
developed on top of HDFS. For our needs, do you think that HDFS already has
enough support that we will not need any major custom development. We are
just saving the files/attachements and retrieving them with some basic


-----Original Message-----
From: Jonathan Gray [mailto:jlist@streamy.com] 
Sent: Sunday, June 07, 2009 9:30 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Help needed - Adding HBase to architecture


HBase stores arbitrary binary values (row keys, column qualifiers, and
column values), so it is certainly capable of storing and serving files
and images.

My only real question before I would give you a +1 on your idea is what
you expect the range of file sizes to be.  While HBase allows you to store
values up to length Integer.MAX_VALUE, that is not recommended and in past
versions has lead to memory issues (OOME and such).

Images, text, word/excel docs, etc... should be no problem.  But I don't
recommend storing things in the upper 10s or 100s of MB, though it's
probably possible with a little work adjusting some configuration
parameters.  In general, if you are approaching HDFS block size, then you
really just want HDFS and not HBase :)

We are not currently running this in production, but we have had an
experimental version of our media server that runs on top of HBase rather
than the file system.  It has a series of Python scripts (connected to
HBase through our custom interface, you could use Java directly or
Thrift/REST/etc) that are responsible for generating various thumbnail
sizes.  The originals are stored in HBase, and then a special query is run
to grab the thumbnail of a certain size.  If it exists in HBase already,
it is just fetched and returned.  Otherwise, it is generated (via PIL,
Python Imaging Library, and some other custom tools), stored in HBase, and
then returned to the client.

As far as HBase on Windows goes... It's currently not possible but there
has been some effort from Powerset/Microsoft to make it happen.  I will
yield to those more familiar with it.

Personally, I run Windows on my primary work desktop and spend a good
chunk of my time on HBase development.  When I've wanted to spin up
pseudo-distributed local clusters, I usually use a cheap Linux node or
local Virtual Machine.  In both cases, I use a Windows X Server and
redirect output to my local Windows machine so I can run Eclipse and unit
tests from my Windows GUI.  Others have used Cygwin with some success, I

Hope that sheds some light for you.

You are almost certainly right about not wanting to store this in an
RDBMS.  And a hybrid approach seems to make sense, especially as a first

Jonathan Gray

On Sun, June 7, 2009 6:44 am, Nitin Gupta wrote:
> Hi All,
> I am working on an application which is kind of a social network on mobile
>  WAP. Recently, we have incorporated the files or attachments support in
> our application. Right now, since we are not in production yet, we are
> keeping all the files in the RDBMS which our application is using. But I
> am more than convinvced that this is not going to work once we are in
> production mode.
> I got to know about HBase and I am making myself convice about its usage
> for the file storage, search and retrieval operations. I would like my
> opinion to be endorsed by expert HBase users/developers. Just for the
> clarification, here is what I am planning to do:
> Make use of a RDBMS for relational data in the application.
> All the files/blob data to be saved in the HBase.
> When required, my application can query app data from the RDBMS and the
> files can be retrieved from the HBase data store I will keep the meta data
> of the files in my rdbms so that files can be associated with my apps
> entities
> Please help me decide if this is the right approach. My app is supposed
> to provide support for images as well. So if anyone can advice if HBase is
> the right solution for me, in conjuction with an imaging tool.
> Since my team is predominantly Windows based, I would like to know is it
> possible to run HBase on a windows machine in stand alone and in clustered
>  mode.
> Thanks for all your help.
> nitin

View raw message