phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: does phoenix+hbase work for tables larger than a few GB?
Date Wed, 30 Sep 2015 22:45:16 GMT
Please find my reply inline.

On Wed, Sep 30, 2015 at 3:29 PM, Konstantinos Kougios <
kostas.kougios@googlemail.com> wrote:

> Thanks for the reply and the useful information Anil.
>
> I am aware of the difficulties of distributed joins and aggregations and
> that phoenix is a layer on top of hbase. It would be great if it could be
> configured to run the queries, even if it takes a lot of time for the
> queries to complete.
>
Anil: I think, it is doable. But, this might require little bit of hit &
trial with HBase and Phoenix conf. I would start with increasing HBase and
Phoenix timeouts.

>
> I got mainly 2 tables of 170GB and 550GB. Aggregation queries on both fail
> and even make region servers crash (there is no info in the logs and still
> don't know why. My server proved to be rock stable so far on other things
> but you never know).
>
Anil: RS should not crash. Are you doing heavy writes along with full table
scans at same time? In one of your email, i saw stack trace regarding
Region  split and compactions?

>
> I am doing full table scans only because so far I was unable to create the
> indexes. I tried async indexes too with the map reduce job to create them
> but it runs extremely slowly.
>
Anil: This doesnt not sounds good. I haven't use those yet. So, i wont be
able to help debug the problem. Hopefully, someone else will be able to
chime in.

>
> In theory full table scans are possible with hbase, so even if it was slow
> it shouldn't fail.
>
Anil: IMO, if you are doing full table scans, then maybe you should turn
off blockCache for those queries. Basically, there is a lot of cache churn
due to full table scans. Cache churn will lead to JVM GC's.

>
> My setup is a 64GB AMD opteron server with 16 cores. 3 lxc virtual
> machines as region servers with Xmx8G, each running on a 3TB 7200rpm disk.
> So somehow I simulate 3x low spec servers with enough ram.
>
> Next thing I will try is give region servers 16GB of RAM. WIth 8GB they
> seem to have some memory pressure and I see some slow GC's in the logs.
>
Anil: 16GB ram should help in some cases. Try to disable blockcache for
full table scans.

>
> Cheers
>
>
>
>
>
> On 30/09/15 21:18, anil gupta wrote:
>
> Hi Konstantinos,
> Please find my reply inline.
>
> On Wed, Sep 30, 2015 at 12:10 PM, Konstantinos Kougios <
> <kostas.kougios@googlemail.com>kostas.kougios@googlemail.com> wrote:
>
>> Hi all,
>>
>> I had various issues with big tables while experimenting the couple last
>> weeks.
>>
>> The thing that goes to my mind is that hbase (+phoenix) works only when
>> there is a fairly powerful cluster and say 1/2 the data can fit into the
>> combined servers memory and disks are fast (SSD?) as well. It doesn't seem
>> to be able to work when tables are 2x as large as the memory allocated to
>> region servers (frankly I think it is less)
>>
> Anil: Phoenix is just a SQL layer over HBase. From the query in your
> previous emails, it seems like you are doing full table scans with group by
> clauses. IMO, HBase is not a DB to be used for full table scans. If 90% of
> your use cases are small range scan or gets then HBase should work nicely
> with Terabytes of data. I have a 40 TB table in prod on 60 node cluster
> where every RS only has 16GB of heap. What kind of workload you are trying
> to run with HBase?
>
>
>>
>> Things that constantly fail:
>>
>> - non-trivial queries on large tables (with group by, counts, joins) with
>> region server out of memory errors or crashes without any reason for Xmx of
>> 4G or 8G
>>
> Anil: Can you convert these queries into short range based scans? If you
> are always going to do full table scan, then maybe you need to use MR or
> Spark for those computation and then tune cluster for full table scans.
> Cluster tuning varies with full table scan workload.
>
>> - index creation on the same big tables. Those always fail I think around
>> the point when hbase has to flush it's memory regions to the disk and
>> couldn't find a solution
>>
> - spark jobs fail unless they are throttled to feed hbase with the data it
>> can take . No backpressure?
>>
>
>> There were no replies to my emails regarding the issues, which makes me
>> think there aren't solutions (or solutions are pretty hard to find and not
>> many ppl know them).
>>
>> So after 21 tweaks to the default config, I am still not able to operate
>> it as a normal database.
>>
> Anil: HBase is actually not a normal RDBMS DB. Its a **keyvalue store**.
> Phoenix is providing a SQL layer using HBase API. So, user will need to
> deal with pros/cons of a key/value store.
>
>>
>> Should I start believing my config is all wrong or that hbase+phoenix is
>> only working if there is a sufficiently powerful cluster to handle the data?
>>
> Anil: **As per my experience**, HBase+Phoenix will work nicely if you are
> doing keyvalue lookups and short range scans.
> I would suggest you to evaluate data model of HBase tables and try to
> convert queries to small range scan or lookups.
>
>>
>> I believe it is a great project and the functionality is really useful.
>> What's lacking is 3 sample configs for 3 different strength clusters.
>>
> Anil: I agree that guidance on configuration of HBase and Phoenix can be
> improved so that people can get going quickly.
>
>>
>> Thanks
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
>


-- 
Thanks & Regards,
Anil Gupta

Mime
View raw message