hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wilm Schumacher <wilm.schumac...@cawoom.com>
Subject Re: Newbie Question about 37TB binary storage on HBase
Date Thu, 27 Nov 2014 22:41:47 GMT
Hi Aleks ;),

Am 27.11.2014 um 22:27 schrieb Aleks Laz:
> Our application is a nginx/php-fpm/postgresql Setup.
> The target design is nginx + proxy features / php-fpm / $DB / $Storage.
> .) Can I mix HDFS /HBase for binary data storage and data analyzing?
yes. hbase is perfect for that. For storage it will work (with the
"MOB-extension") and with map reduce you can do whatever data analyzing
you want. I assume you do some image processing with the data?!?!

> .) What is the preferred way to us HBase  with PHP?
The native client lib is in java. This is the best way to go. But if you
need only basic access from the php application, then thrift or rest
would be a good choice.


There are language bindings for both

> .) How difficult is it to use HBase with PHP?
Depending on what you are trying to do. If you just do a little
fetching, updating, inserting etc. it's pretty easy. More complicate
stuff I would do in java and expose it by a custom api by a java service.

> .) What's a good solution for the 37 TB or the upcoming ~120 TB to
> distribute?
>    [ ] N Servers with 1 37 TB mountpoints per server?
>    [ ] N Servers with x TB mountpoints pers server?
>    [ ] other:
that's "not your business". hbase/hadoop does the trick for you. hbase
distributes the data, replicates it etc.. You will only talk to the master.

> .) Is HBase a good value for $Storage?
yes ;)

> .) Is HBase a good value for $DB?
>     DB-Size is smaller then 1 GB, I would use HBase just for HA features
>     of Hadoop.
well, the official documentation says:
»First, make sure you have enough data. If you have hundreds of millions
or billions of rows, then HBase is a good candidate. If you only have a
few thousand/million rows, then using a traditional RDBMS might be a
better choice ...«

In my experience at around 1-10 million rows RDBMS are not really
useable anymore. But I only used small/cheap hardware ... and don't like

Well, you will have at least 40 million rows ... and the plattform is
growing. I think SQL isn't a choice anymore. And as you have heavy read
and only a few writes hbase is a good fit.

> .) Due to the fact that HBase is a file-system I could use
>       /cams , for binary data
>       /DB   , for DB storage
>       /logs , for log storage
>     but is this wise. On the 'disk' they are different RAIDs.
hbase is a data store. This was probably copy pasted from the original
hadoop question ;).

> .) Should I plan a dedicated Network+Card for the 'cluster
>    communication' as for the most other cluster software?
>    From what I have read it looks not necessary but from security point
>    of view, yes.

Cloudera employees says that it wouldn't harm if you have to push a lot
of data to the cluster.

> .) Maybe the communication with the componnents (hadoop, zk, ...) could
>    be setup ed with TLS?
hbase is build on top of hadoop/hdfs. This in the "hadoop domain".
hadoop can encrypt the transported data by TLS, can encrypt the data on
the disc, you can use kerberos auth (but this stuff I never did) etc.
etc.. So the answer is yes.

Last remark: You seem kind of bound to PHP. The hadoop world is written
in java. Of course there are a lot of ways to do stuff in other
languages, over interfaces etc. But the java api is the most powerful
and sometimes there are no other ways then to use it directly.

Best wishes,


View raw message