From user-return-51416-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Thu May 26 13:50:28 2016 Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BCE6193ED for ; Thu, 26 May 2016 13:50:28 +0000 (UTC) Received: (qmail 85405 invoked by uid 500); 26 May 2016 13:50:26 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 85331 invoked by uid 500); 26 May 2016 13:50:26 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 85316 invoked by uid 99); 26 May 2016 13:50:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 13:50:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 57C19CE032 for ; Thu, 26 May 2016 13:50:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.129 X-Spam-Level: ** X-Spam-Status: No, score=2.129 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_LOTSOFHASH=0.25, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id MJLESlfWzyL7 for ; Thu, 26 May 2016 13:50:24 +0000 (UTC) Received: from mail-yw0-f175.google.com (mail-yw0-f175.google.com [209.85.161.175]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id CF8B95F610 for ; Thu, 26 May 2016 13:50:23 +0000 (UTC) Received: by mail-yw0-f175.google.com with SMTP id c127so76840192ywb.1 for ; Thu, 26 May 2016 06:50:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=is2zz05nRHaPt/bAbwe25KQeeDtdOHm48vajYaNZe3U=; b=tJqlxWZf2jeWALKmu0PzncBgYhiqiWQEYnjBw2QcKmFv2a+u0TjGKFjEXh6CaXTmfs ogoqF7qp9WHpKgbUerdO3ct2EtbgsRFu0J6xfC5pfiYxOtuqTeQORDhZ3mK0xlxJQ43L 6+zPFOvTopYR06yq9D7c32p+vGb00wSff224JFYBKa4uSeMzLDxsUAJbdrWL/FNXtvDC TTzYsCpMq1rHJm7Oy4emrPGgzK8H0vIZY4IhUTvuQxd6WkspWpu+ViEI1XyNuKmvvZIJ HPQNP/uPIwDaBaAKAsyTIrAvtpnsndHisqS0C0Gxpk4PIq0ru0I2hqQOhe8nvy+m9lAP SzMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=is2zz05nRHaPt/bAbwe25KQeeDtdOHm48vajYaNZe3U=; b=L7QwwBAsqTX8oxiDcceq9vn8RYUVb/LY2MmW1o0Ay7bvPnNJPrjzPKxBjvadBMPfjN w9ULV+rQJ/JeESec5BEWlxLiWPb6X1ABJuTB5RctqU3uCMNqq26ks9sZbbG+4BkU+IN4 +zfA2GkZclbpYI5wAsZCO3VRTdHZPQ90i+XaNHEsaegTe27eXUG7BiKuA6/i5p5sw1Bd PlWhLrQ/655OckH9LrfkfD26mJTSosirPwGZXC8uatI/qTm0gn1S2bs+1/wlZ0kJD6db UNpDEhB2qR6r3q5mfJwCqf6EDFiH/EiiuXUlwlX0HxKGwz3njnn/8nXaKuOM35O+lvMR DzvA== X-Gm-Message-State: ALyK8tIUUhVyYtEZpaT8gGmQPt9cJFqEzW+73gkRFlSuMnrBVAVlbFXcm5qjzNqJet2N3QdMGZ0O+rX4ytLUOQ== MIME-Version: 1.0 X-Received: by 10.13.247.4 with SMTP id h4mr6037642ywf.118.1464270622800; Thu, 26 May 2016 06:50:22 -0700 (PDT) Received: by 10.37.37.72 with HTTP; Thu, 26 May 2016 06:50:22 -0700 (PDT) In-Reply-To: References: Date: Thu, 26 May 2016 06:50:22 -0700 Message-ID: Subject: Re: region stuck in failed close state From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=94eb2c07e9f653c09a0533bf12a7 --94eb2c07e9f653c09a0533bf12a7 Content-Type: text/plain; charset=UTF-8 Heng: Can you pastebin the complete stack trace for the region server ? Snippet from region server log may also provide more clue. Thanks On Wed, May 25, 2016 at 9:48 PM, Heng Chen wrote: > On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a) > always in failed close state, So balancer could not run. > > > i check the region on RS, and found logs about this region > > 2016-05-26 12:42:10,490 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up > 'too many store files'; waited long enough... proceeding with flush of > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > 2016-05-26 12:42:20,043 INFO > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1] > regionserver.HRegionServer: > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore > requesting flush for region > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > after a delay of 20753 > 2016-05-26 12:42:30,043 INFO > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1] > regionserver.HRegionServer: > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore > requesting flush for region > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > after a delay of 7057 > > > relate jstack information like below: > > Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2): > State: WAITING > Blocked count: 1 > Waited count: 2 > Waiting on > org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:502) > > org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512) > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371) > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336) > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138) > > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > > > Our HBase cluster version is 1.1.1, i try to compact this region, > compact stuck in progress 89.58% > > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > 85860221 85860221 > 89.58% > --94eb2c07e9f653c09a0533bf12a7--