hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Keller <brya...@gmail.com>
Subject Re: Poor HBase map-reduce scan performance
Date Fri, 03 May 2013 07:17:57 GMT
I finally made some progress. I tried a very large HBase block size (16mb), and it significantly
improved scan performance. I went from 45-50 min to 24 min. Not great but much better. Before
I had it set to 128k. Scanning an equivalent sequence file takes 10 min. My random read performance
will probably suffer with such a large block size (theoretically), so I probably can't keep
it this big. I care about random read performance too. I've read having a block size this
big is not recommended, is that correct?

I haven't dug too deeply into the code, are the block buffers reused or is each new block
read a new allocation? Perhaps a buffer pool could help here if there isn't one already. When
doing a scan, HBase could reuse previously allocated block buffers instead of allocating a
new one for each block. Then block size shouldn't affect scan performance much.

I'm not using a block encoder. Also, I'm still sifting through the profiler results, I'll
see if I can make more sense of it and run some more experiments.

On May 2, 2013, at 5:46 PM, lars hofhansl <larsh@apache.org> wrote:

> Interesting. If you can try 0.94.7 (but it'll probably not have changed that much from
0.94.4)
> 
> 
> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If so, try without. They
currently need to reallocate a ByteBuffer for each single KV.
> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably have not enabled encoding,
just checking).
> 
> 
> And do you have a stack trace for the ByteBuffer.allocate(). That is a strange one since
it never came up in my profiling (unless you enabled block encoding).
> (You can get traces from VisualVM by creating a snapshot, but you'd have to drill in
to find the allocate()).
> 
> 
> During normal scanning (again, without encoding) there should be no allocation happening
except for blocks read from disk (and they should all be the same size, thus allocation should
be cheap).
> 
> -- Lars
> 
> 
> 
> ________________________________
> From: Bryan Keller <bryanck@gmail.com>
> To: user@hbase.apache.org 
> Sent: Thursday, May 2, 2013 10:54 AM
> Subject: Re: Poor HBase map-reduce scan performance
> 
> 
> I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue()
and ByteBuffer.allocate(). It appears at first glance that memory allocations may be an issue.
Decompression was next below that but less of an issue it seems.
> 
> Would changing the block size, either HDFS or HBase, help here?
> 
> Also, if anyone has tips on how else to profile, that would be appreciated. VisualVM
can produce a lot of noise that is hard to sift through.
> 
> 
> On May 1, 2013, at 9:49 PM, Bryan Keller <bryanck@gmail.com> wrote:
> 
>> I used exactly 0.94.4, pulled from the tag in subversion.
>> 
>> On May 1, 2013, at 9:41 PM, lars hofhansl <larsh@apache.org> wrote:
>> 
>>> Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7.
>>> I would be very curious to see profiling data.
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ----- Original Message -----
>>> From: Bryan Keller <bryanck@gmail.com>
>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>> Cc: 
>>> Sent: Wednesday, May 1, 2013 6:01 PM
>>> Subject: Re: Poor HBase map-reduce scan performance
>>> 
>>> I tried running my test with 0.94.4, unfortunately performance was about the
same. I'm planning on profiling the regionserver and trying some other things tonight and
tomorrow and will report back.
>>> 
>>> On May 1, 2013, at 8:00 AM, Bryan Keller <bryanck@gmail.com> wrote:
>>> 
>>>> Yes I would like to try this, if you can point me to the pom.xml patch that
would save me some time.
>>>> 
>>>> On Tuesday, April 30, 2013, lars hofhansl wrote:
>>>> If you can, try 0.94.4+; it should significantly reduce the amount of bytes
copied around in RAM during scanning, especially if you have wide rows and/or large key portions.
That in turns makes scans scale better across cores, since RAM is shared resource between
cores (much like disk).
>>>> 
>>>> 
>>>> It's not hard to build the latest HBase against Cloudera's version of Hadoop.
I can send along a simple patch to pom.xml to do that.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>>   From: Bryan Keller <bryanck@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Sent: Tuesday, April 30, 2013 11:02 PM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>> 
>>>> 
>>>> The table has hashed keys so rows are evenly distributed amongst the regionservers,
and load on each regionserver is pretty much the same. I also have per-table balancing turned
on. I get mostly data local mappers with only a few rack local (maybe 10 of the 250 mappers).
>>>> 
>>>> Currently the table is a wide table schema, with lists of data structures
stored as columns with column prefixes grouping the data structures (e.g. 1_name, 1_address,
1_city, 2_name, 2_address, 2_city). I was thinking of moving those data structures to protobuf
which would cut down on the number of columns. The downside is I can't filter on one value
with that, but it is a tradeoff I would make for performance. I was also considering restructuring
the table into a tall table.
>>>> 
>>>> Something interesting is that my old regionserver machines had five 15k SCSI
drives instead of 2 SSDs, and performance was about the same. Also, my old network was 1gbit,
now it is 10gbit. So neither network nor disk I/O appear to be the bottleneck. The CPU is
rather high for the regionserver so it seems like the best candidate to investigate. I will
try profiling it tomorrow and will report back. I may revisit compression on vs off since
that is adding load to the CPU.
>>>> 
>>>> I'll also come up with a sample program that generates data similar to my
table.
>>>> 
>>>> 
>>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <larsh@apache.org> wrote:
>>>> 
>>>>> Your average row is 35k so scanner caching would not make a huge difference,
although I would have expected some improvements by setting it to 10 or 50 since you have
a wide 10ge pipe.
>>>>> 
>>>>> I assume your table is split sufficiently to touch all RegionServer...
Do you see the same load/IO on all region servers?
>>>>> 
>>>>> A bunch of scan improvements went into HBase since 0.94.2.
>>>>> I blogged about some of these changes here: http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>>>>> 
>>>>> In your case - since you have many columns, each of which carry the rowkey
- you might benefit a lot from HBASE-7279.
>>>>> 
>>>>> In the end HBase *is* slower than straight HDFS for full scans. How could
it not be?
>>>>> So I would start by looking at HDFS first. Make sure Nagle's is disbaled
in both HBase and HDFS.
>>>>> 
>>>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell
is listening, I think he did some tests with HBase on SSDs.
>>>>> With rotating media you typically see an improvement with compression.
With SSDs the added CPU needed for decompression might outweigh the benefits.
>>>>> 
>>>>> At the risk of starting a larger discussion here, I would posit that
HBase's LSM based design, which trades random IO with sequential IO, might be a bit more questionable
on SSDs.
>>>>> 
>>>>> If you can, it would be nice to run a profiler against one of the RegionServers
(or maybe do it with the single RS setup) and see where it is bottlenecked.
>>>>> (And if you send me a sample program to generate some data - not 700g,
though :) - I'll try to do a bit of profiling during the next days as my day job permits,
but I do not have any machines with SSDs).
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> From: Bryan Keller <bryanck@gmail.com>
>>>>> To: user@hbase.apache.org
>>>>> Sent: Tuesday, April 30, 2013 9:31 PM
>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>> 
>>>>> 
>>>>> Yes, I have tried various settings for setCaching() and I have setCacheBlocks(false)
>>>>> 
>>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> 
>>>>>> From http://hbase.apache.org/book.html#mapreduce.example :
>>>>>> 
>>>>>> scan.setCaching(500);        // 1 is the default in Scan, which will
>>>>>> be bad for MapReduce jobs
>>>>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>>>> 
>>>>>> I guess you have used the above setting.
>>>>>> 
>>>>>> 0.94.x releases are compatible. Have you considered upgrading to,
say
>>>>>> 0.94.7 which was recently released ?
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm
>>> 


Mime
View raw message