From user-return-14390-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Thu Dec 09 07:08:25 2010 Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 75832 invoked from network); 9 Dec 2010 07:08:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 07:08:25 -0000 Received: (qmail 94103 invoked by uid 500); 9 Dec 2010 07:08:24 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 93975 invoked by uid 500); 9 Dec 2010 07:08:24 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 93967 invoked by uid 99); 9 Dec 2010 07:08:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 07:08:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of juhani@ninja.co.jp designates 61.213.12.26 as permitted sender) Received: from [61.213.12.26] (HELO s.ninja.co.jp) (61.213.12.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 07:08:16 +0000 X-Virus-Scanned: amavisd-new at ninja.co.jp Message-ID: <4D008097.5080509@ninja.co.jp> Date: Thu, 09 Dec 2010 16:09:11 +0900 From: Juhani Connolly User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: Slow recovery on lost data node? References: <4CFF482A.5040201@ninja.co.jp> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the information, it should help. Regarding using a lot of families... They are currently partitioned in a manner that reflects the various data groups that are likely to be read together... We're doing a lot of big scans on the regions of only one of those families, with scans of the full table being much shorter/rarer. By having separate store files I was hoping this separation would result in less overhead from not reading data that we simply don't need(stuff from the other families). Is the overhead from splitting the store files up large enough to make any savings on file access times not worth it? Or am I missing something else? Thanks, Juhani On 12/09/2010 03:04 AM, Jean-Daniel Cryans wrote: > Hey Juhani, > > The current state of client retries/sleep is something that needs to > be reviewed/redone. It's currently on the roadmap for 0.92, see > https://issues.apache.org/jira/browse/HBASE-2445 > > Regarding what you can do right now, the sleeps are done using an > exponential backoff meaning that the more it sleeps, the longer it > does. Enabling DEBUG in your client should give you more details, but > according to what I see in the logs setting the retries lower should > definitely make it fail faster. > > Now, I see that in your log the split took 55 seconds to complete. > I've recently been working on a major deficiency around that part of > the code. I posted a patch that's ready for commit here > https://issues.apache.org/jira/browse/HBASE-3308 > > I believe that in your case the problem is amplified by the fact that > you are using a ton of families. HBase has some performance issues > managing them, but also in general that many families is probably a > bad design. 99.99% of the time, I see no reason to use more than 1 > family. Try it for yourself and you should see a big improvement > across the board. > > Regarding region size, if you have decent hardware then it's safe to > set MAX_FILESIZE to 1GB on the tables. Also you could just create the > table pre-split and be done with it, for example see > http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor, > byte[], byte[], int) > > J-D > > On Wed, Dec 8, 2010 at 12:56 AM, Juhani Connolly wrote: > >> Hi there, >> >> We're currently running a cluster under expected load, and testing various >> hardware failure cases. Among them is a lost regionServer/dataNode, which >> results in our writer process(in our case a servlet under tomcat) just >> waiting indefinitely on put flushes until the region becomes available >> again(in the process the threads stack up until the server limit). I've >> included logs of the relevant time period from one of my regionservers at >> http://pastie.org/1358217 . >> >> During the 15minutes from around 16:12->16:27 all writes failed. >> Incidentally, during this time I am still able to read data fine with >> another process which is only reading from hbase. >> >> Is this period of not being available to write to for 15 working as >> intended, or is something wrong with the way I'm trying to access hbase? The >> main access code I'm using can be seen at http://pastie.org/1358224 . tPool >> is an initialised HTablePool, and the general idea is to store puts without >> flushing until they have been held onto for a while(to batch the flushes a >> little bit) >> >> If it is working as intended, what would be the correct steps to reduce >> it(perhaps reducing configuration for region sizes)? >> >> Is there anything I can do to just make the writes fail when the region >> isn't available for writing? As is, threads keep getting generated till the >> container max is reached, waiting for something(presumably the region to >> become available again?). I expected that hbase.client.retries.number would >> be appropriate, but based on the lack of any logs for failed writes, the >> current writes simply aren't aborting. >> >> Everything is running off the latest CDH3(hbase-0.89.20100924+28, >> hadoop-0.20.2+737-core) and works well under normal conditions >> >> Any advice/information would be appreciated. >> Thanks, >> Juhani >> >> >> >> >> >