phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <>
Subject Re: does phoenix+hbase work for tables larger than a few GB?
Date Wed, 30 Sep 2015 20:18:06 GMT
Hi Konstantinos,
Please find my reply inline.

On Wed, Sep 30, 2015 at 12:10 PM, Konstantinos Kougios <> wrote:

> Hi all,
> I had various issues with big tables while experimenting the couple last
> weeks.
> The thing that goes to my mind is that hbase (+phoenix) works only when
> there is a fairly powerful cluster and say 1/2 the data can fit into the
> combined servers memory and disks are fast (SSD?) as well. It doesn't seem
> to be able to work when tables are 2x as large as the memory allocated to
> region servers (frankly I think it is less)
Anil: Phoenix is just a SQL layer over HBase. From the query in your
previous emails, it seems like you are doing full table scans with group by
clauses. IMO, HBase is not a DB to be used for full table scans. If 90% of
your use cases are small range scan or gets then HBase should work nicely
with Terabytes of data. I have a 40 TB table in prod on 60 node cluster
where every RS only has 16GB of heap. What kind of workload you are trying
to run with HBase?

> Things that constantly fail:
> - non-trivial queries on large tables (with group by, counts, joins) with
> region server out of memory errors or crashes without any reason for Xmx of
> 4G or 8G
Anil: Can you convert these queries into short range based scans? If you
are always going to do full table scan, then maybe you need to use MR or
Spark for those computation and then tune cluster for full table scans.
Cluster tuning varies with full table scan workload.

> - index creation on the same big tables. Those always fail I think around
> the point when hbase has to flush it's memory regions to the disk and
> couldn't find a solution
- spark jobs fail unless they are throttled to feed hbase with the data it
> can take . No backpressure?

> There were no replies to my emails regarding the issues, which makes me
> think there aren't solutions (or solutions are pretty hard to find and not
> many ppl know them).
> So after 21 tweaks to the default config, I am still not able to operate
> it as a normal database.
Anil: HBase is actually not a normal RDBMS DB. Its a **keyvalue store**.
Phoenix is providing a SQL layer using HBase API. So, user will need to
deal with pros/cons of a key/value store.

> Should I start believing my config is all wrong or that hbase+phoenix is
> only working if there is a sufficiently powerful cluster to handle the data?
Anil: **As per my experience**, HBase+Phoenix will work nicely if you are
doing keyvalue lookups and short range scans.
I would suggest you to evaluate data model of HBase tables and try to
convert queries to small range scan or lookups.

> I believe it is a great project and the functionality is really useful.
> What's lacking is 3 sample configs for 3 different strength clusters.
Anil: I agree that guidance on configuration of HBase and Phoenix can be
improved so that people can get going quickly.

> Thanks

Thanks & Regards,
Anil Gupta

View raw message