hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Basil He <basil...@gmail.com>
Subject Re: global memcache limit of 396.9m exceeded cause forcing server shutdown
Date Thu, 05 Mar 2009 09:30:14 GMT
St.Ack,

On Wed, Mar 4, 2009 at 3:36 PM, stack <stack@duboce.net> wrote:

> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#5.


Yes, we checked this item before, however now we hit another exception while
calling MemcacheFlusher.flushRegion:

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
        at java.io.DataOutputStream.writeInt(DataOutputStream.java:182)
        at org.apache.hadoop.hbase.io.ImmutableBytesWritable.write(ImmutableBytesWritable.java:115)

 We changed according to
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom,
but the problem is still there.
 We have also changed to start 4 regionservers instead of single one, and
will keep an eye on how it will going.


>
> On your hitting the global mem limit, you are uploading at the time, right?
> If so, then its probably fine.  Whats your schema look like by the way?


one of schemas is:
{NAME => '1001_profiles', IS_ROOT => 'false', IS_META => 'false', FAMILIES
=> [{NAME => 'inferred', BLOOMFILTE
R => 'false', COMPRESSION => 'NONE', VERSIONS => '3', LENGTH =>
'2147483647', TTL => '-1', IN_MEMORY => 'false
', BLOCKCACHE => 'false'},
{NAME => 'edge', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS =>
'3', LE
NGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE =>
'false'}, {NAME => 'scored', BLOOMFILTE
R => 'false', COMPRESSION => 'NONE', VERSIONS => '3', LENGTH =>
'2147483647', TTL => '-1', IN_MEMORY => 'false
', BLOCKCACHE => 'false'},
{NAME => 'fetl', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS =>
'3', LE
NGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE =>
'false'}, {NAME => 'reverse_edge', BLOO
MFILTER => 'false', COMPRESSION => 'NONE', VERSIONS => '3', LENGTH =>
'2147483647', TTL => '-1', IN_MEMORY =>
'false', BLOCKCACHE => 'false'},
{NAME => 'pre_fetl', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS

=> '3', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}], INDEXES => []}


>
> The droppedsnapshot exception see in your first message should also be
> addressed by #5 in troubleshooting above.
>
> You have upped your ulimit file descriptors?


For ulimit file descriptors, it's fine now, we added
 * 'root            -       nofile          65536' into
/etc/security/limits.conf
 * 'fs.file-max=200000' into /etc/sysctl.conf, and applied it with sysctl -p
*
*

Thanks for your help,
Basil.

>
>
> St.Ack
>
>
>
>
> On Tue, Mar 3, 2009 at 11:29 PM, Basil He <basil.he@gmail.com> wrote:
>
> > Stack,
> >
> > After we switched to a larger EC2 instance, the problem is still there.
> and
> > at same time we found following message from datanode's log.
> >
> > java.io.IOException: xceiverCount 1024 exceeds the limit of concurrent
> > xcievers 1023
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:87)
> > at java.lang.Thread.run(Thread.java:619)
> > log from datanode.
> >
> > Thanks very much for your regard.
> > Basil.
> >
> > On Sat, Feb 28, 2009 at 10:41 AM, Xiaogang He <basil.he@gmail.com>
> wrote:
> >
> > > stack,
> > >
> > > Thanks for your reply, I really appreciate that.
> > >
> > > On Fri, Feb 27, 2009 at 11:49 PM, stack <stack@duboce.net> wrote:
> > >
> > >> Tell us more about your hbase install?  Number of servers, number of
> > >> regions, schema, general size of your cells and hbase version.
> > >
> > >
> > > We just have a small hadoop cluster with 1 master and 3 slaves, and 1
> > > single hmaser and 1 regionserver, and version numbers are both 0.19.
> > >
> > >
> > >> The configuration that effects most directly the amount of heap used
> is
> > >> the
> > >> below:
> > >>
> > >>  <property>
> > >>    <name>hbase.io.index.interval</name>
> > >>    <value>128</value>
> > >>    <description>The interval at which we record offsets in hbase
> > >>    store files/mapfiles.  Default for stock mapfiles is 128.  Index
> > >>    files are read into memory.  If there are many of them, could prove
> > >>    a burden.  If so play with the hadoop io.map.index.skip property
> and
> > >>    skip every nth index member when reading back the index into
> memory.
> > >>    Downside to high index interval is lowered access times.
> > >>    </description>
> > >>
> > >> You could try setting io.map.index.skip to 4 or 8 across your cluster
> > and
> > >> restart.
> > >>
> > >
> > > We have namenode/secondnamenode/hmaster/regionserver running on a small
> > EC2
> > > instance(1.7G memory).
> > > We think it should be part of the problem, so we switched to a larger
> > > instance now.
> > > We will try above suggestion if we hit the problem again.
> > >
> > >
> > >>
> > >> The flushing of the cache seems to be frustrated by an hdfs error in
> the
> > >> below.  You have read the 'getting started' section and have upped
> your
> > >> ulimit file descriptors?
> > >
> > >
> > > Yes, we have changed ulimit file descriptors according to the FAQ on
> the
> > > hbase official site.
> > >
> > > Thank you very much.
> > >
> > > Regards,
> > > Basil.
> > >
> > >
> > >>
> > >>
> > >> St.Ack
> > >>
> > >> On Thu, Feb 26, 2009 at 8:16 PM, Xiaogang He <basil.he@gmail.com>
> > wrote:
> > >>
> > >> > hi,
> > >> >
> > >> > I'm keeping hit following exception after hbase restarted and
> running
> > a
> > >> > while:
> > >> >
> > >> >        2009-02-26 15:14:04,827 INFO
> > >> > org.apache.hadoop.hbase.regionserver.HLog: Closed
> > >> >
> > >> >
> > >>
> >
> hdfs://hmaster:50001/hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679079054,
> > >> > entries=100053. New log writer:
> > >> > /hbase/log_10.249.190.85_1235626687854_60020/hlog.dat.1235679244824
> > >> >        2009-02-26 15:14:16,405 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1002_profiles,155123497688845858,1235539496917 because global
> memcache
> > >> > limit
> > >> > of 396.9m exceeded; currently 396.9m and flushing till 248.1m
> > >> >        2009-02-26 15:14:18,666 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1002_profiles,145928983691898633,1235539496917 because global
> memcache
> > >> > limit
> > >> > of 396.9m exceeded; currently 386.3m and flushing till 248.1m
> > >> >        2009-02-26 15:14:19,497 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1001_profiles,,1235562106563 because global memcache limit of 396.9m
> > >> > exceeded; currently 376.2m and flushing till 248.1m
> > >> >        2009-02-26 15:14:21,971 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1002_profiles,1859616112140717,1235538938447 because global memcache
> > >> limit
> > >> > of 396.9m exceeded; currently 367.1m and flushing till 248.1m
> > >> >        2009-02-26 15:14:23,506 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1002_profiles,256848350134132138,1235539352160 because global
> memcache
> > >> > limit
> > >> > of 396.9m exceeded; currently 358.2m and flushing till 248.1m
> > >> >        2009-02-26 15:14:26,273 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1001_profiles,38395253911274047,1235562695944 because global
> memcache
> > >> limit
> > >> > of 396.9m exceeded; currently 349.4m and flushing till 248.1m
> > >> >        2009-02-26 15:14:27,946 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1001_relationships,18320094988761441,1235659399900 because global
> > >> memcache
> > >> > limit of 396.9m exceeded; currently 340.8m and flushing till 248.1m
> > >> >        2009-02-26 15:14:28,898 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1001_profiles,183105869903093166,1235658588032 because global
> memcache
> > >> > limit
> > >> > of 396.9m exceeded; currently 332.3m and flushing till 248.1m
> > >> >        2009-02-26 15:14:29,857 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1001_relationships,1279872936511407,1235563047231 because global
> > >> memcache
> > >> > limit of 396.9m exceeded; currently 323.9m and flushing till 248.1m
> > >> >        2009-02-26 15:14:30,338 INFO
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced
> flushing
> > of
> > >> > 1002_profiles,9374985809090827,1235658787938 because global memcache
> > >> limit
> > >> > of 396.9m exceeded; currently 315.5m and flushing till 248.1m
> > >> >        2009-02-26 15:14:31,284 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Exception in createBlockOutputStream java.io.IOException: Bad
> connect
> > >> ack
> > >> > with firstBadLink 10.249.187.102:50010
> > >> >        2009-02-26 15:14:31,284 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Abandoning block blk_-8226110948737137663_51382
> > >> >        2009-02-26 15:14:39,640 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Exception in createBlockOutputStream java.io.IOException: Could not
> > read
> > >> > from stream
> > >> >        2009-02-26 15:14:39,640 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Abandoning block blk_4802751471280593846_51382
> > >> >        2009-02-26 15:14:45,807 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Exception in createBlockOutputStream java.io.IOException: Bad
> connect
> > >> ack
> > >> > with firstBadLink 10.249.187.102:50010
> > >> >        2009-02-26 15:14:45,807 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Abandoning block blk_-3919223098697505175_51382
> > >> >        2009-02-26 15:14:51,813 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Exception in createBlockOutputStream java.io.IOException: Could not
> > read
> > >> > from stream
> > >> >        2009-02-26 15:14:51,813 INFO
> org.apache.hadoop.hdfs.DFSClient:
> > >> > Abandoning block blk_-6922144209752436228_51382
> > >> >        2009-02-26 15:14:57,827 WARN
> org.apache.hadoop.hdfs.DFSClient:
> > >> > DataStreamer Exception: java.io.IOException: Unable to create new
> > block.
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
> > >> >
> > >> >        2009-02-26 15:14:57,845 WARN
> org.apache.hadoop.hdfs.DFSClient:
> > >> Error
> > >> > Recovery for block blk_-6922144209752436228_51382 bad datanode[0]
> > nodes
> > >> ==
> > >> > null
> > >> >        2009-02-26 15:14:57,846 WARN
> org.apache.hadoop.hdfs.DFSClient:
> > >> Could
> > >> > not get block locations. Aborting...
> > >> >        2009-02-26 15:14:57,924 FATAL
> > >> > org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Replay of hlog
> > >> > required. Forcing server shutdown
> > >> >        org.apache.hadoop.hbase.DroppedSnapshotException: region:
> > >> > 1002_profiles,9374985809090827,1235658787938
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:896)
> > >> >            at
> > >> >
> > >>
> > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:789)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushRegion(MemcacheFlusher.java:227)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.flushSomeRegions(MemcacheFlusher.java:291)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.MemcacheFlusher.reclaimMemcacheMemory(MemcacheFlusher.java:261)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1614)
> > >> >            at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown
> > >> Source)
> > >> >            at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >> >            at java.lang.reflect.Method.invoke(Method.java:597)
> > >> >            at
> > >> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632)
> > >> >            at
> > >> >
> > >>
> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
> > >> >
> > >> >
> > >> > I noticed there are some parameters regarding this, such as
> > >> > *hbase.regionserver.globalMemcache.upperLimit
> > >> > and **hbase.regionserver.globalMemcache.lowerLimit*.
> > >> >
> > >> > I'm just using the default settings,
> > >> >  <property>
> > >> >    <name>hbase.regionserver.globalMemcache.upperLimit</name>
> > >> >    <value>0.4</value>
> > >> >    <description>Maximum size of all memcaches in a region server
> > before
> > >> new
> > >> >      updates are blocked and flushes are forced. Defaults to 40% of
> > >> heap.
> > >> >    </description>
> > >> >  </property>
> > >> >  <property>
> > >> >    <name>hbase.regionserver.globalMemcache.lowerLimit</name>
> > >> >    <value>0.25</value>
> > >> >    <description>When memcaches are being forced to flush to
make
> room
> > in
> > >> >      memory, keep flushing until we hit this mark. Defaults to 30%
> of
> > >> heap.
> > >> >      This value equal to
> hbase.regionserver.globalmemcache.upperLimit
> > >> > causes
> > >> >      the minimum possible flushing to occur when updates are blocked
> > due
> > >> to
> > >> >      memcache limiting.
> > >> >    </description>
> > >> >  </property>
> > >> >
> > >> > Could anyone please give me some guide to help me out of this issue?
> > >> >
> > >> > Thanks,
> > >> > Basil.
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message