hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel JeliƄski <djelins...@gmail.com>
Subject HBase as a file repository
Date Thu, 30 Mar 2017 20:01:06 GMT
I'm evaluating HBase as a cheaper replacement for NAS as a file storage
medium. To that end I have a cluster of 5 machines, 36TB HDD each; I'm
planning to initially store ~240 million files of size 1KB-100MB, total
size 30TB. Currently I'm storing each file under an individual column, and
I group related documents in the same row. The files from the same row will
be served one at a time, but updated/deleted together.

Loading the data to the cluster went pretty well; I enabled MOB on the
table and have ~50 regions per machine. Writes to the table are done by an
automated process, and cluster's performance in that area is more than
sufficient. On the other hand, reads are interactive, as the files are
served to human users over HTTP.

Now. HBase Get in Java API is an atomic operation in the sense that it does
not complete until all data is retrieved from the server. It takes 100 ms
to retrieve a 1MB cell (file), and only after retrieving I am able to start
serving it to the end user. For larger cells the wait time is even longer,
and response times longer than 100 ms are bad for user experience. I would
like to start streaming the file over HTTP as soon as possible.

What's the recommended approach to avoid or reduce the delay between when
HBase starts sending the response and when the application can act on it?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message