hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Söztutar <e...@hortonworks.com>
Subject Re: Poor HBase map-reduce scan performance
Date Wed, 29 May 2013 20:29:50 GMT
Hi,

Regarding running raw scans on top of Hfiles, you can try a version of the
patch attached at https://issues.apache.org/jira/browse/HBASE-8369, which
enables exactly this. However, the patch is for trunk.

In that, we open one region from snapshot files in each record reader, and
run a scan through using an internal region scanner. Since this bypasses
the client + rpc + server daemon layers, it should be able to give optimum
scan performance.

There is also a tool called HFilePerformanceBenchmark that intends to
measure raw performance for HFiles. I've had to do a lot of changes to make
is workable, but it might be worth to take a look to see whether there is
any perf difference between scanning a sequence file from hdfs vs scanning
an hfile.

Enis


On Fri, May 24, 2013 at 10:50 PM, lars hofhansl <larsh@apache.org> wrote:

> Sorry. Haven't gotten to this, yet.
>
> Scanning in HBase being about 3x slower than straight HDFS is in the right
> ballpark, though. It has to a bit more work.
>
> Generally, HBase is great at honing in to a subset (some 10-100m rows) of
> the data. Raw scan performance is not (yet) a strength of HBase.
>
> So with HDFS you get to 75% of the theoretical maximum read throughput;
> hence with HBase you to 25% of the theoretical cluster wide maximum disk
> throughput?
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Bryan Keller <bryanck@gmail.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, May 10, 2013 8:46 AM
> Subject: Re: Poor HBase map-reduce scan performance
>
> FYI, I ran tests with compression on and off.
>
> With a plain HDFS sequence file and compression off, I am getting very
> good I/O numbers, roughly 75% of theoretical max for reads. With snappy
> compression on with a sequence file, I/O speed is about 3x slower. However
> the file size is 3x smaller so it takes about the same time to scan.
>
> With HBase, the results are equivalent (just much slower than a sequence
> file). Scanning a compressed table is about 3x slower I/O than an
> uncompressed table, but the table is 3x smaller, so the time to scan is
> about the same. Scanning an HBase table takes about 3x as long as scanning
> the sequence file export of the table, either compressed or uncompressed.
> The sequence file export file size ends up being just barely larger than
> the table, either compressed or uncompressed
>
> So in sum, compression slows down I/O 3x, but the file is 3x smaller so
> the time to scan is about the same. Adding in HBase slows things down
> another 3x. So I'm seeing 9x faster I/O scanning an uncompressed sequence
> file vs scanning a compressed table.
>
>
> On May 8, 2013, at 10:15 AM, Bryan Keller <bryanck@gmail.com> wrote:
>
> > Thanks for the offer Lars! I haven't made much progress speeding things
> up.
> >
> > I finally put together a test program that populates a table that is
> similar to my production dataset. I have a readme that should describe
> things, hopefully enough to make it useable. There is code to populate a
> test table, code to scan the table, and code to scan sequence files from an
> export (to compare HBase w/ raw HDFS). I use a gradle build script.
> >
> > You can find the code here:
> >
> > https://dl.dropboxusercontent.com/u/6880177/hbasetest.zip
> >
> >
> > On May 4, 2013, at 6:33 PM, lars hofhansl <larsh@apache.org> wrote:
> >
> >> The blockbuffers are not reused, but that by itself should not be a
> problem as they are all the same size (at least I have never identified
> that as one in my profiling sessions).
> >>
> >> My offer still stands to do some profiling myself if there is an easy
> way to generate data of similar shape.
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >> From: Bryan Keller <bryanck@gmail.com>
> >> To: user@hbase.apache.org
> >> Sent: Friday, May 3, 2013 3:44 AM
> >> Subject: Re: Poor HBase map-reduce scan performance
> >>
> >>
> >> Actually I'm not too confident in my results re block size, they may
> have been related to major compaction. I'm going to rerun before drawing
> any conclusions.
> >>
> >> On May 3, 2013, at 12:17 AM, Bryan Keller <bryanck@gmail.com> wrote:
> >>
> >>> I finally made some progress. I tried a very large HBase block size
> (16mb), and it significantly improved scan performance. I went from 45-50
> min to 24 min. Not great but much better. Before I had it set to 128k.
> Scanning an equivalent sequence file takes 10 min. My random read
> performance will probably suffer with such a large block size
> (theoretically), so I probably can't keep it this big. I care about random
> read performance too. I've read having a block size this big is not
> recommended, is that correct?
> >>>
> >>> I haven't dug too deeply into the code, are the block buffers reused
> or is each new block read a new allocation? Perhaps a buffer pool could
> help here if there isn't one already. When doing a scan, HBase could reuse
> previously allocated block buffers instead of allocating a new one for each
> block. Then block size shouldn't affect scan performance much.
> >>>
> >>> I'm not using a block encoder. Also, I'm still sifting through the
> profiler results, I'll see if I can make more sense of it and run some more
> experiments.
> >>>
> >>> On May 2, 2013, at 5:46 PM, lars hofhansl <larsh@apache.org> wrote:
> >>>
> >>>> Interesting. If you can try 0.94.7 (but it'll probably not have
> changed that much from 0.94.4)
> >>>>
> >>>>
> >>>> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If
> so, try without. They currently need to reallocate a ByteBuffer for each
> single KV.
> >>>> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably
> have not enabled encoding, just checking).
> >>>>
> >>>>
> >>>> And do you have a stack trace for the ByteBuffer.allocate(). That is
> a strange one since it never came up in my profiling (unless you enabled
> block encoding).
> >>>> (You can get traces from VisualVM by creating a snapshot, but you'd
> have to drill in to find the allocate()).
> >>>>
> >>>>
> >>>> During normal scanning (again, without encoding) there should be no
> allocation happening except for blocks read from disk (and they should all
> be the same size, thus allocation should be cheap).
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>>
> >>>> ________________________________
> >>>> From: Bryan Keller <bryanck@gmail.com>
> >>>> To: user@hbase.apache.org
> >>>> Sent: Thursday, May 2, 2013 10:54 AM
> >>>> Subject: Re: Poor HBase map-reduce scan performance
> >>>>
> >>>>
> >>>> I ran one of my regionservers through VisualVM. It looks like the top
> hot spots are HFileReaderV2$ScannerV2.getKeyValue() and
> ByteBuffer.allocate(). It appears at first glance that memory allocations
> may be an issue. Decompression was next below that but less of an issue it
> seems.
> >>>>
> >>>> Would changing the block size, either HDFS or HBase, help here?
> >>>>
> >>>> Also, if anyone has tips on how else to profile, that would be
> appreciated. VisualVM can produce a lot of noise that is hard to sift
> through.
> >>>>
> >>>>
> >>>> On May 1, 2013, at 9:49 PM, Bryan Keller <bryanck@gmail.com> wrote:
> >>>>
> >>>>> I used exactly 0.94.4, pulled from the tag in subversion.
> >>>>>
> >>>>> On May 1, 2013, at 9:41 PM, lars hofhansl <larsh@apache.org>
wrote:
> >>>>>
> >>>>>> Hmm... Did you actually use exactly version 0.94.4, or the latest
> 0.94.7.
> >>>>>> I would be very curious to see profiling data.
> >>>>>>
> >>>>>> -- Lars
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ----- Original Message -----
> >>>>>> From: Bryan Keller <bryanck@gmail.com>
> >>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
> >>>>>> Cc:
> >>>>>> Sent: Wednesday, May 1, 2013 6:01 PM
> >>>>>> Subject: Re: Poor HBase map-reduce scan performance
> >>>>>>
> >>>>>> I tried running my test with 0.94.4, unfortunately performance
was
> about the same. I'm planning on profiling the regionserver and trying some
> other things tonight and tomorrow and will report back.
> >>>>>>
> >>>>>> On May 1, 2013, at 8:00 AM, Bryan Keller <bryanck@gmail.com>
wrote:
> >>>>>>
> >>>>>>> Yes I would like to try this, if you can point me to the
pom.xml
> patch that would save me some time.
> >>>>>>>
> >>>>>>> On Tuesday, April 30, 2013, lars hofhansl wrote:
> >>>>>>> If you can, try 0.94.4+; it should significantly reduce
the amount
> of bytes copied around in RAM during scanning, especially if you have wide
> rows and/or large key portions. That in turns makes scans scale better
> across cores, since RAM is shared resource between cores (much like disk).
> >>>>>>>
> >>>>>>>
> >>>>>>> It's not hard to build the latest HBase against Cloudera's
version
> of Hadoop. I can send along a simple patch to pom.xml to do that.
> >>>>>>>
> >>>>>>> -- Lars
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> ________________________________
> >>>>>>>  From: Bryan Keller <bryanck@gmail.com>
> >>>>>>> To: user@hbase.apache.org
> >>>>>>> Sent: Tuesday, April 30, 2013 11:02 PM
> >>>>>>> Subject: Re: Poor HBase map-reduce scan performance
> >>>>>>>
> >>>>>>>
> >>>>>>> The table has hashed keys so rows are evenly distributed
amongst
> the regionservers, and load on each regionserver is pretty much the same. I
> also have per-table balancing turned on. I get mostly data local mappers
> with only a few rack local (maybe 10 of the 250 mappers).
> >>>>>>>
> >>>>>>> Currently the table is a wide table schema, with lists of
data
> structures stored as columns with column prefixes grouping the data
> structures (e.g. 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I
> was thinking of moving those data structures to protobuf which would cut
> down on the number of columns. The downside is I can't filter on one value
> with that, but it is a tradeoff I would make for performance. I was also
> considering restructuring the table into a tall table.
> >>>>>>>
> >>>>>>> Something interesting is that my old regionserver machines
had
> five 15k SCSI drives instead of 2 SSDs, and performance was about the same.
> Also, my old network was 1gbit, now it is 10gbit. So neither network nor
> disk I/O appear to be the bottleneck. The CPU is rather high for the
> regionserver so it seems like the best candidate to investigate. I will try
> profiling it tomorrow and will report back. I may revisit compression on vs
> off since that is adding load to the CPU.
> >>>>>>>
> >>>>>>> I'll also come up with a sample program that generates data
> similar to my table.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <larsh@apache.org>
> wrote:
> >>>>>>>
> >>>>>>>> Your average row is 35k so scanner caching would not
make a huge
> difference, although I would have expected some improvements by setting it
> to 10 or 50 since you have a wide 10ge pipe.
> >>>>>>>>
> >>>>>>>> I assume your table is split sufficiently to touch all
> RegionServer... Do you see the same load/IO on all region servers?
> >>>>>>>>
> >>>>>>>> A bunch of scan improvements went into HBase since 0.94.2.
> >>>>>>>> I blogged about some of these changes here:
> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
> >>>>>>>>
> >>>>>>>> In your case - since you have many columns, each of
which carry
> the rowkey - you might benefit a lot from HBASE-7279.
> >>>>>>>>
> >>>>>>>> In the end HBase *is* slower than straight HDFS for
full scans.
> How could it not be?
> >>>>>>>> So I would start by looking at HDFS first. Make sure
Nagle's is
> disbaled in both HBase and HDFS.
> >>>>>>>>
> >>>>>>>> And lastly SSDs are somewhat new territory for HBase.
Maybe Andy
> Purtell is listening, I think he did some tests with HBase on SSDs.
> >>>>>>>> With rotating media you typically see an improvement
with
> compression. With SSDs the added CPU needed for decompression might
> outweigh the benefits.
> >>>>>>>>
> >>>>>>>> At the risk of starting a larger discussion here, I
would posit
> that HBase's LSM based design, which trades random IO with sequential IO,
> might be a bit more questionable on SSDs.
> >>>>>>>>
> >>>>>>>> If you can, it would be nice to run a profiler against
one of the
> RegionServers (or maybe do it with the single RS setup) and see where it is
> bottlenecked.
> >>>>>>>> (And if you send me a sample program to generate some
data - not
> 700g, though :) - I'll try to do a bit of profiling during the next days as
> my day job permits, but I do not have any machines with SSDs).
> >>>>>>>>
> >>>>>>>> -- Lars
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ________________________________
> >>>>>>>> From: Bryan Keller <bryanck@gmail.com>
> >>>>>>>> To: user@hbase.apache.org
> >>>>>>>> Sent: Tuesday, April 30, 2013 9:31 PM
> >>>>>>>> Subject: Re: Poor HBase map-reduce scan performance
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Yes, I have tried various settings for setCaching()
and I have
> setCacheBlocks(false)
> >>>>>>>>
> >>>>>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhihong@gmail.com>
wrote:
> >>>>>>>>
> >>>>>>>>> From http://hbase.apache.org/book.html#mapreduce.example
:
> >>>>>>>>>
> >>>>>>>>> scan.setCaching(500);        // 1 is the default
in Scan, which
> will
> >>>>>>>>> be bad for MapReduce jobs
> >>>>>>>>> scan.setCacheBlocks(false);  // don't set to true
for MR jobs
> >>>>>>>>>
> >>>>>>>>> I guess you have used the above setting.
> >>>>>>>>>
> >>>>>>>>> 0.94.x releases are compatible. Have you considered
upgrading
> to, say
> >>>>>>>>> 0.94.7 which was recently released ?
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm
> >>>>>>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message