hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase as a file repository
Date Thu, 30 Mar 2017 22:03:55 GMT
Have you read:

In particular:

When using MOBs, ideally your objects will be between 100KB and 10MB


On Thu, Mar 30, 2017 at 1:01 PM, Daniel JeliƄski <djelinski1@gmail.com>

> Hello,
> I'm evaluating HBase as a cheaper replacement for NAS as a file storage
> medium. To that end I have a cluster of 5 machines, 36TB HDD each; I'm
> planning to initially store ~240 million files of size 1KB-100MB, total
> size 30TB. Currently I'm storing each file under an individual column, and
> I group related documents in the same row. The files from the same row will
> be served one at a time, but updated/deleted together.
> Loading the data to the cluster went pretty well; I enabled MOB on the
> table and have ~50 regions per machine. Writes to the table are done by an
> automated process, and cluster's performance in that area is more than
> sufficient. On the other hand, reads are interactive, as the files are
> served to human users over HTTP.
> Now. HBase Get in Java API is an atomic operation in the sense that it does
> not complete until all data is retrieved from the server. It takes 100 ms
> to retrieve a 1MB cell (file), and only after retrieving I am able to start
> serving it to the end user. For larger cells the wait time is even longer,
> and response times longer than 100 ms are bad for user experience. I would
> like to start streaming the file over HTTP as soon as possible.
> What's the recommended approach to avoid or reduce the delay between when
> HBase starts sending the response and when the application can act on it?
> Thanks,
> Daniel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message