hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Optimizing compactions on super-low-cost HW
Date Fri, 22 May 2015 10:00:11 GMT
Look, to be blunt, you’re screwed. 

If I read your cluster spec.. it sounds like you have a single i7 (quad core) cpu. That’s
4 cores or 8 threads. 

Mirroring the OS is common practice. 
Using the same drives for Hadoop… not so good, but once the sever boots up… not so much
I/O.
Its not good, but you could live with it…. 

Your best bet is to add a couple of more spindles. Ideally you’d want to have 6 drives.
the 2 OS drives mirrored and separate. (Use the extra space to stash / write logs.) Then have
4 drives / spindles in JBOD for Hadoop. This brings you to a 1:1 on physical cores.  If your
box can handle more spindles, then going to a total of 10 drives would improve performance
further. 

However, you need to level set your expectations… you can only go so far. If you have 4
drives spinning,  you could start to saturate a 1GbE network so that will hurt performance.


That’s pretty much your only option in terms of fixing the hardware and then you have to
start tuning.

> On May 21, 2015, at 4:04 PM, Stack <stack@duboce.net> wrote:
> 
> On Thu, May 21, 2015 at 1:04 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> 
>>> Do you have the system sharing
>> There are 2 HDD 7200 2TB each. There is 300GB OS partition on each drive
>> with mirroring enabled. I can't persuade devops that mirroring could cause
>> IO issues. What arguments can I bring? They use OS partition mirroring when
>> disck fails, we can use other partition to boot OS and continue to work...
>> 
>> 
> You are already compromised i/o-wise having two disks only. I have not the
> experience to say for sure but basic physics would seem to dictate that
> having your two disks (partially) mirrored compromises your i/o even more.
> 
> You are in a bit of a hard place. Your operators want the machine to boot
> even after it loses 50% of its disk.
> 
> 
>>> Do you have to compact? In other words, do you have read SLAs?
>> Unfortunately, I have mixed workload from web applications. I need to write
>> and read and SLA is < 50ms.
>> 
>> 
> Ok. You get the bit that seeks are about 10ms or each so with two disks you
> can do 2x100 seeks a second presuming no one else is using disk.
> 
> 
>>> How are your read times currently?
>> Cloudera manager says it's 4K reads per second and 500 writes per second
>> 
>>> Does your working dataset fit in RAM or do
>> reads have to go to disk?
>> I have several tables for 500GB each and many small tables 10-20 GB. Small
>> tables loaded hourly/daily using bulkload (prepare HFiles using MR and move
>> them to HBase using utility). Big tables are used by webapps, they read and
>> write them.
>> 
>> 
> These hfiles are created on same cluster with MR? (i.e. they are using up
> i/os)
> 
> 
>>> It looks like you are running at about three storefiles per column family
>> is it hbase.hstore.compactionThreshold=3?
>> 
> 
> 
>>> What if you upped the threshold at which minors run?
>> you mean bump  hbase.hstore.compactionThreshold to 8 or 10?
>> 
>> 
> Yes.
> 
> Downside is that your reads may require more seeks to find a keyvalue.
> 
> Can you cache more?
> 
> Can you make it so files are bigger before you flush?
> 
> 
> 
>>> Do you have a downtime during which you could schedule compactions?
>> Unfortunately no. It should work 24/7 and sometimes it doesn't do it.
>> 
>> 
> So, it is running at full bore 24/7?  There is no 'downtime'... a time when
> the traffic is not so heavy?
> 
> 
> 
>>> Are you managing the major compactions yourself or are you having hbase do
>> it for you?
>> HBase, once a day hbase.hregion.majorcompaction=1day
>> 
>> 
> Have you studied your compactions?  You realize that a major compaction
> will do full rewrite of your dataset?  When they run, how many storefiles
> are there?
> 
> Do you have to run once a day?  Can you not run once a week?  Can you
> manage the compactions yourself... and run them a region at a time in a
> rolling manner across the cluster rather than have them just run whenever
> it suits them once a day?
> 
> 
> 
>> I can disable WAL. It's ok to loose some data in case of RS failure. I'm
>> not doing banking transactions.
>> If I disable WAL, could it help?
>> 
>> 
> It could but don't. Enable deferring sync'ing first if you can 'lose' some
> data.
> 
> Work on your flushing and compactions before you mess w/ WAL.
> 
> What version of hbase are you on? You say CDH but the newer your hbase, the
> better it does generally.
> 
> St.Ack
> 
> 
> 
> 
> 
>> 2015-05-20 18:04 GMT+03:00 Stack <stack@duboce.net>:
>> 
>>> On Mon, May 18, 2015 at 4:26 PM, Serega Sheypak <
>> serega.sheypak@gmail.com>
>>> wrote:
>>> 
>>>> Hi, we are using extremely cheap HW:
>>>> 2 HHD 7200
>>>> 4*2 core (Hyperthreading)
>>>> 32GB RAM
>>>> 
>>>> We met serious IO performance issues.
>>>> We have more or less even distribution of read/write requests. The same
>>> for
>>>> datasize.
>>>> 
>>>> ServerName Request Per Second Read Request Count Write Request Count
>>>> node01.domain.com,60020,1430172017193 195 171871826 16761699
>>>> node02.domain.com,60020,1426925053570 24 34314930 16006603
>>>> node03.domain.com,60020,1430860939797 22 32054801 16913299
>>>> node04.domain.com,60020,1431975656065 33 1765121 253405
>>>> node05.domain.com,60020,1430484646409 27 42248883 16406280
>>>> node07.domain.com,60020,1426776403757 27 36324492 16299432
>>>> node08.domain.com,60020,1426775898757 26 38507165 13582109
>>>> node09.domain.com,60020,1430440612531 27 34360873 15080194
>>>> node11.domain.com,60020,1431989669340 28 44307 13466
>>>> node12.domain.com,60020,1431927604238 30 5318096 2020855
>>>> node13.domain.com,60020,1431372874221 29 31764957 15843688
>>>> node14.domain.com,60020,1429640630771 41 36300097 13049801
>>>> 
>>>> ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed
>>>> Storefile
>>>> Size Index Size Bloom Size
>>>> node01.domain.com,60020,1430172017193 82 186 1052080m 76496mb 641849k
>>>> 310111k
>>>> node02.domain.com,60020,1426925053570 82 179 1062730m 79713mb 649610k
>>>> 318854k
>>>> node03.domain.com,60020,1430860939797 82 179 1036597m 76199mb 627346k
>>>> 307136k
>>>> node04.domain.com,60020,1431975656065 82 400 1034624m 76405mb 655954k
>>>> 289316k
>>>> node05.domain.com,60020,1430484646409 82 185 1111807m 81474mb 688136k
>>>> 334127k
>>>> node07.domain.com,60020,1426776403757 82 164 1023217m 74830mb 631774k
>>>> 296169k
>>>> node08.domain.com,60020,1426775898757 81 171 1086446m 79933mb 681486k
>>>> 312325k
>>>> node09.domain.com,60020,1430440612531 81 160 1073852m 77874mb 658924k
>>>> 309734k
>>>> node11.domain.com,60020,1431989669340 81 166 1006322m 75652mb 664753k
>>>> 264081k
>>>> node12.domain.com,60020,1431927604238 82 188 1050229m 75140mb 652970k
>>>> 304137k
>>>> node13.domain.com,60020,1431372874221 82 178 937557m 70042mb 601684k
>>>> 257607k
>>>> node14.domain.com,60020,1429640630771 82 145 949090m 69749mb 592812k
>>>> 266677k
>>>> 
>>>> 
>>>> When compaction starts  random node gets I/O 100%, io wait for seconds,
>>>> even tenth of seconds.
>>>> 
>>>> What are the approaches to optimize minor and major compactions when
>> you
>>>> are I/O bound..?
>>>> 
>>> 
>>> Yeah, with two disks, you will be crimped. Do you have the system sharing
>>> with hbase/hdfs or is hdfs running on one disk only?
>>> 
>>> Do you have to compact? In other words, do you have read SLAs?  How are
>>> your read times currently?  Does your working dataset fit in RAM or do
>>> reads have to go to disk?  It looks like you are running at about three
>>> storefiles per column family.  What if you upped the threshold at which
>>> minors run? Do you have a downtime during which you could schedule
>>> compactions? Are you managing the major compactions yourself or are you
>>> having hbase do it for you?
>>> 
>>> St.Ack
>>> 
>> 


Mime
View raw message