hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xavier Stevens <xstev...@mozilla.com>
Subject Re: MR sharded Scans giving poor performance..
Date Mon, 26 Jul 2010 22:55:42 GMT
 It's better performing for us because of the way we structured our
keys.  We use a "salt" character plus a date at the front of the key.

So "f20100701xxxxxxxxx" for instance.  We use [0-f] as a salt so that we
get keys more distributed across the cluster.  This is because when we
had just the date at the front of the key it would all go to one
server.  Then if we ran a MR job the RegionServer would fall over under
the load.

So the first implementation I did just used TableInputFormat with
startRow="020100701" and stopRow="f20100701".  The problem with this was
we would have to scan over things like "a20100630" which we didn't
really want.  That's why we went the multiple scan range route. 

The code isn't any better and we could probably just subclass off of
TableInputFormat at this point.  It was just easier for me to step
through the process having all of the code in one spot.  The MultiScan
just reduces the amount of data we read and works better for us with the
way our keys are.


On 7/26/10 3:37 PM, Ryan Rawson wrote:
> Hey,
> That sounds interesting - maybe you could tell us about why your
> system is better performing? The default TableInputFormat is just
> creating N map tasks, one for each region, which are all roughly the
> same data-size.
> What do you do?
> -ryan
> On Mon, Jul 26, 2010 at 3:29 PM, Xavier Stevens <xstevens@mozilla.com> wrote:
>>  We have something that might interest you.
>> http://socorro.googlecode.com/svn/trunk/analysis/src/java/org/apache/hadoop/hbase/mapreduce/
>> We haven't fully tested everything yet, so don't blame us if something
>> goes wrong.  It's basically the exact same as TableInputFormat except it
>> takes an array of Scans to be used for row-key ranges.  It requires the
>> caller to setup the Scan array since they should have the best knowledge
>> about their row-key structure.
>> Preliminary results for us reduced a 15 minute job to under 2 minutes.
>> Cheers,
>> -Xavier
>> On 7/26/10 3:16 PM, Vidhyashankar Venkataraman wrote:
>>> I did not use a TableInputFormat: I ran my own scans on specific ranges (just
for more control from my side to tune the ranges and the ease with which I can run a hadoop
streaming job)..
>>> 1 MB for Hfile Block size.. Not the HDFS block size..
>>> I increased it since I didn't care too much for random read performance.. HDFS
block size is the default value... (I have a related question then: does the Hfile block size
influence only the size of the index and the efficiency of random reads?  I don't see an effect
on scans though)...
>>>   I had previously run 5 tasks per machine and at 20 rows, but that resulted
in scanner expiries (UnknownScannerexception) and DFS socket timeouts.. So that's why I reduced
the number of tasks.. Let me decrease the number of rows and see..
>>>   Just to make sure: the client uses zookeeper only for obtaining ROOT right
whenever it performs scans, isnt it? So scans shouldn't face any master/zk bottlenecks when
we scale up wrt number of nodes, am I right?
>>> Thank you
>>> Vidhya
>>> On 7/26/10 3:01 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:
>>> Hey,
>>> A few questions:
>>> - sharded scan, are you not using TableInputFormat?
>>> - 1 MB block size - what block size?  You probably shouldnt set the
>>> HDFS block size to 1MB, it just causes more nn traffic.
>>> - Tests a year ago indicated that HFile block size really didnt
>>> improve speed when you went beyond 64k or so.
>>> - Run more maps/machine... one map task per disk probably?
>>> - Try setting the client cache to an in-between level, 2-6 perhaps.
>>> Let us know about those other questions and we can go from there.
>>> -ryan
>>> On Mon, Jul 26, 2010 at 2:43 PM, Vidhyashankar Venkataraman
>>> <vidhyash@yahoo-inc.com> wrote:
>>>> I am trying to assess the performance of Scans on a 100TB db on 180 nodes
running Hbase 0.20.5..
>>>> I run a sharded scan (each Map task runs a scan on a specific range: speculative
execution is turned false so that there is no duplication in tasks) on a fully compacted table...
>>>> 1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row
is 30 KB in size: 1 big column family with just one field..
>>>> Region lease timeout is set to an hour.. And I don't get any socket timeout
exceptions so I have not reassigned the write socket timeout...
>>>> I ran experiments on the following cases:
>>>>  1.  The client level cache is set to 1 (default: got he number using getCaching):
The MR tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps
per node. The worst case is 34 hours (to finish the entire job)...
>>>>  2.  Client cache set to 20 rows: this is much worse than the previous case:
we get around a super low 1MBps per node...
>>>>         Question: Should I set it to a value such that the block size is
a multiple of the above said cache size? Or the cache size to a much lower value?
>>>> I find that these numbers are much less than the ones I get when it's running
with just a few nodes..
>>>> Can you guys help me with this problem?
>>>> Thank you
>>>> Vidhya

View raw message