hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: MR sharded Scans giving poor performance..
Date Mon, 26 Jul 2010 22:00:52 GMT
Oh and forgot to add, 4 gig regions and 8 gig heap size..

On 7/26/10 2:43 PM, "Vidhyashankar Venkataraman" <vidhyash@yahoo-inc.com> wrote:

I am trying to assess the performance of Scans on a 100TB db on 180 nodes running Hbase 0.20.5..

I run a sharded scan (each Map task runs a scan on a specific range: speculative execution
is turned false so that there is no duplication in tasks) on a fully compacted table...

1 MB block size, Block cache enabled.. Max of 2 tasks per node..  Each row is 30 KB in size:
1 big column family with just one field..
Region lease timeout is set to an hour.. And I don't get any socket timeout exceptions so
I have not reassigned the write socket timeout...

I ran experiments on the following cases:

 1.  The client level cache is set to 1 (default: got he number using getCaching): The MR
tasks take around 13 hours to finish in the average.. Which gives around 13.17 MBps per node.
The worst case is 34 hours (to finish the entire job)...
 2.  Client cache set to 20 rows: this is much worse than the previous case: we get around
a super low 1MBps per node...

         Question: Should I set it to a value such that the block size is a multiple of the
above said cache size? Or the cache size to a much lower value?

I find that these numbers are much less than the ones I get when it's running with just a
few nodes..

Can you guys help me with this problem?

Thank you

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message