hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zsongbo <zson...@gmail.com>
Subject Re: Help needed - Adding HBase to architecture
Date Sat, 13 Jun 2009 18:59:12 GMT
Hi Billy,

I agree "Hbase would be better suited to store the meta data in place of the
images." very much.And store files in HDFS or other storage system such as
S3. But for small files, S3-like object storage system will be better.

Another issue to discuss with you:
How many tablets/regions served in each of you HBase region server in you
practices? The Bigtable paper suggests at most handreds.


On Mon, Jun 8, 2009 at 2:28 PM, Billy Pearson <sales@pearsonwholesale.com>wrote:

> If I was going to use a RDBMS to store the meta data then I would just use
> hadoop hdfs to store the images/video
> I know that hadoop has a thrift api now
> http://wiki.apache.org/hadoop/HDFS-APIs
> Hbase would be better suited to store the meta data in place of the images.
> The biggest benefit to hbase is you can scale the reads and writes to the
> db not just the reads in most RDBMS
> So you should be able to work with the files in hadoop in any language as
> long as you can get hadoop working correctly on windows.
> The benefit of this is you can scale hadoop as needed to hold more data.
> The downside to this is the memory that will be required for the namenode
> I thank its like 3m files per gb of memory or something like that
> "Nitin Gupta" <nitingupta183@gmail.com> wrote in message
> news:003c01c9e7fc$2087df00$61979d00$@com...
>  Jonathan,
>> Thanks for detailed explanation. Much helpful.
>> As far as file size is concerned, we may be even required to save Videos
>> in
>> future. So we shall def go above the HBase size limit at some point in
>> time.
>> Any other solution or key-value database that you can recommend for our
>> case?
>> I am not much knowledgeable about the HDFS either. I think if we go with
>> pure HDFS, then all the required DB operations would have to be custom
>> developed on top of HDFS. For our needs, do you think that HDFS already
>> has
>> enough support that we will not need any major custom development. We are
>> just saving the files/attachements and retrieving them with some basic
>> search.
>> Regards,
>> Nitin
>> -----Original Message-----
>> From: Jonathan Gray [mailto:jlist@streamy.com]
>> Sent: Sunday, June 07, 2009 9:30 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Help needed - Adding HBase to architecture
>> Nitin,
>> HBase stores arbitrary binary values (row keys, column qualifiers, and
>> column values), so it is certainly capable of storing and serving files
>> and images.
>> My only real question before I would give you a +1 on your idea is what
>> you expect the range of file sizes to be.  While HBase allows you to store
>> values up to length Integer.MAX_VALUE, that is not recommended and in past
>> versions has lead to memory issues (OOME and such).
>> Images, text, word/excel docs, etc... should be no problem.  But I don't
>> recommend storing things in the upper 10s or 100s of MB, though it's
>> probably possible with a little work adjusting some configuration
>> parameters.  In general, if you are approaching HDFS block size, then you
>> really just want HDFS and not HBase :)
>> We are not currently running this in production, but we have had an
>> experimental version of our media server that runs on top of HBase rather
>> than the file system.  It has a series of Python scripts (connected to
>> HBase through our custom interface, you could use Java directly or
>> Thrift/REST/etc) that are responsible for generating various thumbnail
>> sizes.  The originals are stored in HBase, and then a special query is run
>> to grab the thumbnail of a certain size.  If it exists in HBase already,
>> it is just fetched and returned.  Otherwise, it is generated (via PIL,
>> Python Imaging Library, and some other custom tools), stored in HBase, and
>> then returned to the client.
>> As far as HBase on Windows goes... It's currently not possible but there
>> has been some effort from Powerset/Microsoft to make it happen.  I will
>> yield to those more familiar with it.
>> Personally, I run Windows on my primary work desktop and spend a good
>> chunk of my time on HBase development.  When I've wanted to spin up
>> pseudo-distributed local clusters, I usually use a cheap Linux node or
>> local Virtual Machine.  In both cases, I use a Windows X Server and
>> redirect output to my local Windows machine so I can run Eclipse and unit
>> tests from my Windows GUI.  Others have used Cygwin with some success, I
>> believe.
>> Hope that sheds some light for you.
>> You are almost certainly right about not wanting to store this in an
>> RDBMS.  And a hybrid approach seems to make sense, especially as a first
>> step.
>> Jonathan Gray
>> On Sun, June 7, 2009 6:44 am, Nitin Gupta wrote:
>>> Hi All,
>>> I am working on an application which is kind of a social network on
>>> mobile
>>>  WAP. Recently, we have incorporated the files or attachments support in
>>> our application. Right now, since we are not in production yet, we are
>>> keeping all the files in the RDBMS which our application is using. But I
>>> am more than convinvced that this is not going to work once we are in
>>> production mode.
>>> I got to know about HBase and I am making myself convice about its usage
>>> for the file storage, search and retrieval operations. I would like my
>>> opinion to be endorsed by expert HBase users/developers. Just for the
>>> clarification, here is what I am planning to do:
>>> Make use of a RDBMS for relational data in the application.
>>> All the files/blob data to be saved in the HBase.
>>> When required, my application can query app data from the RDBMS and the
>>> files can be retrieved from the HBase data store I will keep the meta
>>> data
>>> of the files in my rdbms so that files can be associated with my apps
>>> entities
>>> Please help me decide if this is the right approach. My app is supposed
>>> to provide support for images as well. So if anyone can advice if HBase
>>> is
>>> the right solution for me, in conjuction with an imaging tool.
>>> Since my team is predominantly Windows based, I would like to know is it
>>> possible to run HBase on a windows machine in stand alone and in
>>> clustered
>>>  mode.
>>> Thanks for all your help.
>>> nitin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message