hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase region size
Date Fri, 01 Jul 2011 06:22:53 GMT
On Wed, Jun 29, 2011 at 10:08 PM, Florin P <florinpico@yahoo.com> wrote:
>  We have the almost the same scenario as Aditya, but with some differences.
>  1. our files are documents in any format (xls, pdf, doc, html etc)
>  2. we are expecting to have more than 5 millions of these documents

This is not many docs.  Will your document set be steady-state once it hits 5M?

>  3. The size of them varies like this
>            70% from them have their length < 1MB
>            29% from them have their length between 1MB and 10 MB
>            1% from them have their length > 10MB (they can have also 100MB)

What David says above though Jack in his yfrog presentation today
talks of storing all images in hbase up to 5MB in size.

Karthick in his presentation at hadoop summit talked about how once
cells cross a certain size -- he didn't saw what the threshold was I
believe -- then only the metadata is stored in hbase and the content
goes to their "big stuff" system.

Try it I'd say.  If only a few instances of 100MB, HBase might be fine.

>  4. We have to index all these files
>  5. We have to extract some metadata from just a subset of them having as input a client

One time or ongoing?


View raw message