From user-return-11863-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Mon Aug 02 23:22:05 2010 Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 89471 invoked from network); 2 Aug 2010 23:22:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Aug 2010 23:22:05 -0000 Received: (qmail 72936 invoked by uid 500); 2 Aug 2010 23:22:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 72896 invoked by uid 500); 2 Aug 2010 23:22:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 72888 invoked by uid 99); 2 Aug 2010 23:22:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 23:22:03 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jdcryans@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-wy0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 23:21:57 +0000 Received: by wyg36 with SMTP id 36so4441998wyg.14 for ; Mon, 02 Aug 2010 16:21:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=WhY1GQ22Ig3ZjZWWIU7KhWcVuQac2kxqZtK0OgXnnZI=; b=uCv7V9t20IOVD2zy+BN6IrrpnCESt5uxkwID5KLKzgJcT0u2kIIUQ0egnIZbaEVvN6 WHv8xDZVRuK7BrT+KIk0KjBh3SRHD7tBFjBDtJobI4Jby6gzt9boQZ54l9+4fx27VwoD zTODkceuaL/WRYFOiQWpSu1yUaIAShju2zKqI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=udkCX1gqcLftJQqd8sfPPmiPnoePjeUw/uImCQnzmZSOYYVwi7MxP7pA0p3GDLhFII oWDTQObwgUnUnAZTu20EBGjfUwRuYjmdKfZXyHEJIvnxEJMIwevlt2LF5f8G6rAmtNr8 gknD3frfFxLm5j4qHGTmItIPpzMmztAgfS4mk= MIME-Version: 1.0 Received: by 10.216.2.129 with SMTP id 1mr15110wef.40.1280791294374; Mon, 02 Aug 2010 16:21:34 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.216.1.11 with HTTP; Mon, 2 Aug 2010 16:21:34 -0700 (PDT) In-Reply-To: References: <5A76F6CE309AD049AAF9A039A392428207244A91@sc-mbx03.TheFacebook.com> <5A76F6CE309AD049AAF9A039A392428207244F17@sc-mbx03.TheFacebook.com> Date: Mon, 2 Aug 2010 16:21:34 -0700 X-Google-Sender-Auth: JdY6m0kU9RFz8dOUCXIo14iPtQA Message-ID: Subject: Re: Memory Consumption and Processing questions From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Something to keep in mind is that the block cache is within the region server's JVM, whereas it has to go on the network to get data from the DNs (which should always be slower even if it's in the OS cache). But, on a production system, regions don't move that much so the local DN should always contain the blocks for it's RS's regions. If https://issues.apache.org/jira/browse/HDFS-347 was there, block caching could be almost useless if the OS is given a lot of room and there would be no need for IB and whatnot. J-D On Mon, Aug 2, 2010 at 4:00 PM, Jacques wrote: > Makes me wonder if high speed interconnects and little to no block cache > would work better--basically rely on each machine to hold the highly used > blocks in os cache and push them around quickly if they are needed > elsewhere. =A0Of course it's all just a thought experiment at this point.= =A0The > cost of having high speed interconnects would probably be substantially m= ore > than provisioning extra memory to hold cached blocks twice. =A0There is a= lso > the thought that if the blocks are cached by Hbase, they would appear rar= ely > used from the os standpoint and are, therefore, unlikely to be in cache. > > > > > On Mon, Aug 2, 2010 at 8:39 AM, Edward Capriolo wr= ote: > >> On Mon, Aug 2, 2010 at 11:33 AM, Jacques wrote: >> > You're right, of course. =A0I shouldn't generalize too much. =A0I'm mo= re >> trying >> > to understand the landscape than pinpoint anything specific. >> > >> > Quick question: since the block cache is unaware of the location of >> files, >> > wouldn't it overlap the os cache for hfiles once they are localized af= ter >> > compaction? =A0Any guidance on how to tune the two? >> > >> > thanks, >> > Jacques >> > >> > On Sun, Aug 1, 2010 at 9:08 PM, Jonathan Gray >> wrote: >> > >> >> One reason not to extrapolate that is that leaving lots of memory for >> the >> >> linux buffer cache is a good way to improve overall performance of >> typically >> >> i/o bound applications like Hadoop and HBase. >> >> >> >> Also, I'm unsure that "most people use ~8 for hdfs/mr". =A0DataNodes >> >> generally require almost no significant memory (though generally run >> with >> >> 1GB); their performance will improve with more free memory for the os >> buffer >> >> cache. =A0As for MR, this completely depends on the tasks running. = =A0The >> >> TaskTrackers also don't require significant memory, so this completel= y >> >> depends on the number of tasks per node and the memory requirements o= f >> the >> >> tasks. >> >> >> >> Unfortunately you can't always generalize the requirements too much, >> >> especially in MR. >> >> >> >> JG >> >> >> >> > -----Original Message----- >> >> > From: Jacques [mailto:whshub@gmail.com] >> >> > Sent: Sunday, August 01, 2010 5:30 PM >> >> > To: user@hbase.apache.org >> >> > Subject: Re: Memory Consumption and Processing questions >> >> > >> >> > Thanks, that was very helpful. >> >> > >> >> > Regarding 24gb-- I saw people using servers with 32gb of server mem= ory >> >> > (a >> >> > recent thread here and hstack.org). =A0I extrapolated the use since= it >> >> > seems >> >> > most people use ~8 for hdfs/mr. >> >> > >> >> > -Jacques >> >> > >> >> > >> >> > On Sun, Aug 1, 2010 at 11:39 AM, Jonathan Gray >> >> > wrote: >> >> > >> >> > > >> >> > > >> >> > > > -----Original Message----- >> >> > > > From: Jacques [mailto:whshub@gmail.com] >> >> > > > Sent: Friday, July 30, 2010 1:16 PM >> >> > > > To: user@hbase.apache.org >> >> > > > Subject: Memory Consumption and Processing questions >> >> > > > >> >> > > > Hello all, >> >> > > > >> >> > > > I'm planning an hbase implementation and had some questions I w= as >> >> > > > hoping >> >> > > > someone could help with. >> >> > > > >> >> > > > 1. Can someone give me a basic overview of how memory is used i= n >> >> > Hbase? >> >> > > > =A0Various places on the web people state that 16-24gb is the >> minimum >> >> > for >> >> > > > region servers if they also operate as hdfs/mr nodes. =A0Assumi= ng >> >> > that >> >> > > > hdfs/mr >> >> > > > nodes consume ~8gb that leaves a "minimum" of 8-16gb for hbase. >> =A0It >> >> > > > seems >> >> > > > like lots of people suggesting use of even 24gb+ for hbase. =A0= Why >> so >> >> > > > much? >> >> > > > =A0Is it simply to avoid gc problems? =A0Have data in memory fo= r fast >> >> > > > random >> >> > > > reads? Or? >> >> > > >> >> > > Where exactly are you reading this from? =A0I'm not actually awar= e of >> >> > people >> >> > > using 24GB+ heaps for HBase. >> >> > > >> >> > > I would not recommend using less than 4GB for RegionServers. =A0B= eyond >> >> > that, >> >> > > it very much depends on your application. =A08GB is often suffici= ent >> >> > but I've >> >> > > seen as much as 16GB used in production. >> >> > > >> >> > > You need at least 4GB because of GC. =A0General experience has be= en >> >> > that >> >> > > below that the CMS GC does not work well. >> >> > > >> >> > > Memory is used primarily for the MemStores (write cache) and Bloc= k >> >> > Cache >> >> > > (read cache). =A0In addition, memory is allocated as part of norm= al >> >> > operations >> >> > > to store in-memory state and in processing reads. >> >> > > >> >> > > > 2. What types of things put more/less pressure on memory? =A0I = saw >> >> > > > insinuation >> >> > > > that insert speed can create substantial memory pressure. =A0Wh= at >> >> > type of >> >> > > > relative memory pressure do scanners, random reads, random writ= es, >> >> > > > region >> >> > > > quantity and compactions cause? >> >> > > >> >> > > Writes are buffered and flushed to disk when the write buffer get= s >> to >> >> > a >> >> > > local or global limit. =A0The local limit (per region) defaults t= o >> >> > 64MB. =A0The >> >> > > global limit is based on the total amount of heap available >> (default, >> >> > I >> >> > > think, is 40%). =A0So there is interplay between how much heap yo= u >> have >> >> > and >> >> > > how many regions are actively written to. =A0If you have too many >> >> > regions and >> >> > > not enough memory to allow them to hit the local/region limit, yo= u >> >> > end up >> >> > > flushing undersized files. >> >> > > >> >> > > Scanning/random reading will utilize the block cache, if configur= ed >> >> > to. >> >> > > =A0The more room for the block cache, the more data you can keep = in- >> >> > memory. >> >> > > =A0Reads from the block cache are significantly faster than non-c= ached >> >> > reads, >> >> > > obviously. >> >> > > >> >> > > Compactions are not generally an issue. >> >> > > >> >> > > > 2. How cpu intensive are the region servers? =A0It seems like m= ost >> of >> >> > > > their >> >> > > > performance is based on i/o. =A0(I've noted the caution in star= ving >> >> > > > region >> >> > > > servers of cycles--which seems primarily focused on avoiding zk >> >> > timeout >> >> > > > > >> >> > > > region reassignment problems.) =A0Does anyone suggest or recomm= end >> >> > > > against >> >> > > > dedicating only one or two cores to a region server? =A0Do >> individual >> >> > > > compactions benefit from multiple cores are they single-threade= d? >> >> > > >> >> > > I would dedicate at least one core to a region server, but as we = add >> >> > more >> >> > > and more concurrency, it may become important to have two cores >> >> > available. >> >> > > =A0Many things, like compactions, are only single threaded today = but >> >> > there's a >> >> > > very good chance you will be able to configure multiple threads i= n >> >> > the next >> >> > > major release. >> >> > > >> >> > > > 3. What are the memory and cpu resource demands of the master >> >> > server? >> >> > > > It >> >> > > > seems like more and more of that load is moving to zk. >> >> > > >> >> > > Not too much. =A0I'm putting a change in TRUNK right now that kee= ps >> all >> >> > > region assignments in the master, so there is some memory usage, = but >> >> > not >> >> > > much. =A0I would think 2GB heap and 1-2 cores is sufficient. >> >> > > >> >> > > > 4. General HDFS question-- when the namenode dies, what happens= to >> >> > the >> >> > > > datanodes and how does that relate to Hbase? =A0E.g., can hbase >> >> > continue >> >> > > > to >> >> > > > operate in a read-only mode (assuming no datanode/regionserver >> >> > failures >> >> > > > post >> >> > > > namenode failure)? >> >> > > >> >> > > Today, HBase will probably die ungracefully once it does start to >> hit >> >> > the >> >> > > NN. =A0There are some open JIRAs about HBase behavior under diffe= rent >> >> > HDFS >> >> > > faults and trying to be as graceful as possible when they happen, >> >> > including >> >> > > HBASE-2183 about riding over an HDFS restart. >> >> > > >> >> > > > >> >> > > > Thanks for your help, >> >> > > > Jacques >> >> > > >> >> >> > >> >> Interesting question. The problem is that java is unaware of what is >> in the VFS cache so theoretically could could end up with data in the >> BlockCache and in the VFS cache. Committing the memory to the JVM will >> give less to the system and as a result the system will have less to >> VFS cache with. >> >