hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: recommended nodes
Date Thu, 20 Dec 2012 21:22:37 GMT
Hi Jean,

Very interesting benchmark - how are these numbers arrived at ? Is this on
a real hbase cluster ? To me, it felt kind of counter intuitive that RAID0
beats JBOD on random seeks because with RAID0 all disks need to seek at the
same time and the performance should basically be as bad as the slowest
seeking disk.

Varun

On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> Yeah,
> I couldn't argue against LVMs when talking with the system admins.
> In terms of speed its noise because the CPUs are pretty efficient and
> unless you have more than 1 drive per physical core, you will end up
> saturating your disk I/O.
>
> In terms of MapR, you want the raw disk. (But we're talking Apache)
>
>
> On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
> wrote:
>
> > Finally, it took me a while to run those tests because it was way
> > longer than expected, but here are the results:
> >
> > http://www.spaggiari.org/bonnie.html
> >
> > LVM is not really slower than JBOD and not really taking more CPU. So
> > I will say, if you have to choose between the 2, take the one you
> > prefer. Personally, I prefer LVM because it's easy to configure.
> >
> > The big winner here is RAID0. It's WAY faster than anything else. But
> > it's using twice the space... Your choice.
> >
> > I did not get a chance to test with the Ubuntu tool because it's not
> > working with LVM drives.
> >
> > JM
> >
> > 2012/11/28, Michael Segel <michael_segel@hotmail.com>:
> >> Ok, just a caveat.
> >>
> >> I am discussing MapR as part of a complete response. As Mohit posted
> MapR
> >> takes the raw device for their MapR File System.
> >> They do stripe on their own within what they call a volume.
> >>
> >> But going back to Apache...
> >> You can stripe drives, however I wouldn't recommend it. I don't think
> the
> >> performance gains would really matter.
> >> You're going to end up getting blocked first by disk i/o, then your
> >> controller card, then your network... assuming 10GBe.
> >>
> >> With only 2 disks on an 8 core system, you will hit disk i/o first and
> then
> >> you'll watch your CPU Wait I/O climb.
> >>
> >> HTH
> >>
> >> -Mike
> >>
> >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> >> wrote:
> >>
> >>> Hi Mike,
> >>>
> >>> Why not using LVM with MapR? Since LVM is reading from 2 drives almost
> >>> at the same time, it should be better than RAID0 or a single drive,
> >>> no?
> >>>
> >>> 2012/11/28, Michael Segel <michael_segel@hotmail.com>:
> >>>> Just a couple of things.
> >>>>
> >>>> I'm neutral on the use of LVMs. Some would point out that there's some
> >>>> overhead, but on the flip side, it can make managing the machines
> >>>> easier.
> >>>> If you're using MapR, you don't want to use LVMs but raw devices.
> >>>>
> >>>> In terms of GC, its going to depend on the heap size and not the total
> >>>> memory. With respect to HBase. ... MSLABS is the way to go.
> >>>>
> >>>>
> >>>> On Nov 28, 2012, at 12:05 PM, Jean-Marc Spaggiari
> >>>> <jean-marc@spaggiari.org>
> >>>> wrote:
> >>>>
> >>>>> Hi Gregory,
> >>>>>
> >>>>> I founs this about LVM:
> >>>>> -> http://blog.andrew.net.au/2006/08/09
> >>>>> ->
> >>>>>
> http://www.phoronix.com/scan.php?page=article&item=fedora_15_lvm&num=2
> >>>>>
> >>>>> Seems that performances are still correct with it. I will most
> >>>>> probably give it a try and bench that too... I have one new hard
> drive
> >>>>> which should arrived tomorrow. Perfect timing ;)
> >>>>>
> >>>>>
> >>>>>
> >>>>> JM
> >>>>>
> >>>>> 2012/11/28, Mohit Anchlia <mohitanchlia@gmail.com>:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Nov 28, 2012, at 9:07 AM, Adrien Mogenet <
> adrien.mogenet@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Does HBase really benefit from 64 GB of RAM since allocating
too
> >>>>>>> large
> >>>>>>> heap
> >>>>>>> might increase GC time ?
> >>>>>>>
> >>>>>> Benefit you get is from OS cache
> >>>>>>> Another question : why not RAID 0, in order to aggregate
disk
> >>>>>>> bandwidth
> >>>>>>> ?
> >>>>>>> (and thus keep 3x replication factor)
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Nov 28, 2012 at 5:58 PM, Michael Segel
> >>>>>>> <michael_segel@hotmail.com>wrote:
> >>>>>>>
> >>>>>>>> Sorry,
> >>>>>>>>
> >>>>>>>> I need to clarify.
> >>>>>>>>
> >>>>>>>> 4GB per physical core is a good starting point.
> >>>>>>>> So with 2 quad core chips, that is going to be 32GB.
> >>>>>>>>
> >>>>>>>> IMHO that's a minimum. If you go with HBase, you will
want more.
> >>>>>>>> (Actually
> >>>>>>>> you will need more.) The next logical jump would be
to 48 or 64GB.
> >>>>>>>>
> >>>>>>>> If we start to price out memory, depending on vendor,
your
> company's
> >>>>>>>> procurement,  there really isn't much of a price difference
in
> terms
> >>>>>>>> of
> >>>>>>>> 32,48, or 64 GB.
> >>>>>>>> Note that it also depends on the chips themselves. Also
you need
> to
> >>>>>>>> see
> >>>>>>>> how many memory channels exist in the mother board.
You may need
> to
> >>>>>>>> buy
> >>>>>>>> in
> >>>>>>>> pairs or triplets. Your hardware vendor can help you.
(Also you
> need
> >>>>>>>> to
> >>>>>>>> keep an eye on your hardware vendor. Sometimes they
will give you
> >>>>>>>> higher
> >>>>>>>> density chips that are going to be more expensive...)
;-)
> >>>>>>>>
> >>>>>>>> I tend to like having extra memory from the start.
> >>>>>>>> It gives you a bit more freedom and also protects you
from 'fat'
> >>>>>>>> code.
> >>>>>>>>
> >>>>>>>> Looking at YARN... you will need more memory too.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With respect to the hard drives...
> >>>>>>>>
> >>>>>>>> The best recommendation is to keep the drives as JBOD
and then use
> >>>>>>>> 3x
> >>>>>>>> replication.
> >>>>>>>> In this case, make sure that the disk controller cards
can handle
> >>>>>>>> JBOD.
> >>>>>>>> (Some don't support JBOD out of the box)
> >>>>>>>>
> >>>>>>>> With respect to RAID...
> >>>>>>>>
> >>>>>>>> If you are running MapR, no need for RAID.
> >>>>>>>> If you are running an Apache derivative, you could use
RAID 1.
> Then
> >>>>>>>> cut
> >>>>>>>> your replication to 2X. This makes it easier to manage
drive
> >>>>>>>> failures.
> >>>>>>>> (Its not the norm, but it works...) In some clusters,
they are
> using
> >>>>>>>> appliances like Net App's e series where the machines
see the
> drives
> >>>>>>>> as
> >>>>>>>> local attached storage and I think the appliances themselves
are
> >>>>>>>> using
> >>>>>>>> RAID.  I haven't played with this configuration, however
it could
> >>>>>>>> make
> >>>>>>>> sense and its a valid design.
> >>>>>>>>
> >>>>>>>> HTH
> >>>>>>>>
> >>>>>>>> -Mike
> >>>>>>>>
> >>>>>>>> On Nov 28, 2012, at 10:33 AM, Jean-Marc Spaggiari
> >>>>>>>> <jean-marc@spaggiari.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Mike,
> >>>>>>>>>
> >>>>>>>>> Thanks for all those details!
> >>>>>>>>>
> >>>>>>>>> So to simplify the equation, for 16 virtual cores
we need 48 to
> >>>>>>>>> 64GB.
> >>>>>>>>> Which mean 3 to 4GB per core. So with quad cores,
12GB to 16GB
> are
> >>>>>>>>> a
> >>>>>>>>> good start? Or I simplified it to much?
> >>>>>>>>>
> >>>>>>>>> Regarding the hard drives. If you add more than
one drive, do you
> >>>>>>>>> need
> >>>>>>>>> to build them on RAID or similar systems? Or can
Hadoop/HBase be
> >>>>>>>>> configured to use more than one drive?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> JM
> >>>>>>>>>
> >>>>>>>>> 2012/11/27, Michael Segel <michael_segel@hotmail.com>:
> >>>>>>>>>>
> >>>>>>>>>> OK... I don't know why Cloudera is so hung up
on 32GB. ;-) [Its
> an
> >>>>>>>> inside
> >>>>>>>>>> joke ...]
> >>>>>>>>>>
> >>>>>>>>>> So here's the problem...
> >>>>>>>>>>
> >>>>>>>>>> By default, your child processes in a map/reduce
job get a
> default
> >>>>>>>> 512MB.
> >>>>>>>>>> The majority of the time, this gets raised to
1GB.
> >>>>>>>>>>
> >>>>>>>>>> 8 cores (dual quad cores) shows up at 16 virtual
processors in
> >>>>>>>>>> Linux.
> >>>>>>>> (Note:
> >>>>>>>>>> This is why when people talk about the number
of cores, you have
> >>>>>>>>>> to
> >>>>>>>> specify
> >>>>>>>>>> physical cores or logical cores....)
> >>>>>>>>>>
> >>>>>>>>>> So if you were to over subscribe and have lets
say 12  mappers
> and
> >>>>>>>>>> 12
> >>>>>>>>>> reducers, that's 24 slots. Which means that
you would need 24GB
> of
> >>>>>>>> memory
> >>>>>>>>>> reserved just for the child processes. This
would leave 8GB for
> >>>>>>>>>> DN,
> >>>>>>>>>> TT
> >>>>>>>> and
> >>>>>>>>>> the rest of the linux OS processes.
> >>>>>>>>>>
> >>>>>>>>>> Can you live with that? Sure.
> >>>>>>>>>> Now add in R, HBase, Impala, or some other set
of tools on top
> of
> >>>>>>>>>> the
> >>>>>>>>>> cluster.
> >>>>>>>>>>
> >>>>>>>>>> Ooops! Now you are in trouble because you will
swap.
> >>>>>>>>>> Also adding in R, you may want to bump up those
child procs from
> >>>>>>>>>> 1GB
> >>>>>>>>>> to
> >>>>>>>> 2
> >>>>>>>>>> GB. That means the 24 slots would now require
48GB.  Now you
> have
> >>>>>>>>>> swap
> >>>>>>>> and
> >>>>>>>>>> if that happens you will see HBase in a cascading
failure.
> >>>>>>>>>>
> >>>>>>>>>> So while you can do a rolling restart with the
changed
> >>>>>>>>>> configuration
> >>>>>>>>>> (reducing the number of mappers and reducers)
you end up with
> less
> >>>>>>>>>> slots
> >>>>>>>>>> which will mean in longer run time for your
jobs. (Less slots ==
> >>>>>>>>>> less
> >>>>>>>>>> parallelism )
> >>>>>>>>>>
> >>>>>>>>>> Looking at the price of memory... you can get
48GB or even 64GB
> >>>>>>>>>> for
> >>>>>>>> around
> >>>>>>>>>> the same price point. (8GB chips)
> >>>>>>>>>>
> >>>>>>>>>> And I didn't even talk about adding SOLR either
again a memory
> >>>>>>>>>> hog...
> >>>>>>>> ;-)
> >>>>>>>>>>
> >>>>>>>>>> Note that I matched the number of mappers w
reducers. You could
> go
> >>>>>>>>>> with
> >>>>>>>>>> fewer reducers if you want. I tend to recommend
a ratio of 2:1
> >>>>>>>>>> mappers
> >>>>>>>> to
> >>>>>>>>>> reducers, depending on the work flow....
> >>>>>>>>>>
> >>>>>>>>>> As to the disks... no 7200 SATA III drives are
fine. SATA III
> >>>>>>>>>> interface
> >>>>>>>> is
> >>>>>>>>>> pretty much available in the new kit being shipped.
> >>>>>>>>>> Its just that you don't have enough drives.
8 cores should be 8
> >>>>>>>> spindles if
> >>>>>>>>>> available.
> >>>>>>>>>> Otherwise you end up seeing your CPU load climb
on wait states
> as
> >>>>>>>>>> the
> >>>>>>>>>> processes wait for the disk i/o to catch up.
> >>>>>>>>>>
> >>>>>>>>>> I mean you could build out a cluster w 4 x 3
3.5" 2TB drives in
> a
> >>>>>>>>>> 1
> >>>>>>>>>> U
> >>>>>>>>>> chassis based on price. You're making a trade
off and you should
> >>>>>>>>>> be
> >>>>>>>> aware of
> >>>>>>>>>> the performance hit you will take.
> >>>>>>>>>>
> >>>>>>>>>> HTH
> >>>>>>>>>>
> >>>>>>>>>> -Mike
> >>>>>>>>>>
> >>>>>>>>>> On Nov 27, 2012, at 1:52 PM, Jean-Marc Spaggiari
<
> >>>>>>>> jean-marc@spaggiari.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Michael,
> >>>>>>>>>>>
> >>>>>>>>>>> so are you recommanding 32Gb per node?
> >>>>>>>>>>>
> >>>>>>>>>>> What about the disks? SATA drives are to
slow?
> >>>>>>>>>>>
> >>>>>>>>>>> JM
> >>>>>>>>>>>
> >>>>>>>>>>> 2012/11/26, Michael Segel <michael_segel@hotmail.com>:
> >>>>>>>>>>>> Uhm, those specs are actually now out
of date.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you're running HBase, or want to
also run R on top of
> Hadoop,
> >>>>>>>>>>>> you
> >>>>>>>>>>>> will
> >>>>>>>>>>>> need to add more memory.
> >>>>>>>>>>>> Also forget 1GBe got 10GBe,  and w 2
SATA drives, you will be
> >>>>>>>>>>>> disk
> >>>>>>>>>>>> i/o
> >>>>>>>>>>>> bound
> >>>>>>>>>>>> way too quickly.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Nov 26, 2012, at 8:05 AM, Marcos
Ortiz <mlortiz@uci.cu>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Are you asking about hardware recommendations?
> >>>>>>>>>>>>> Eric Sammer on his "Hadoop Operations"
book, did a great job
> >>>>>>>>>>>>> about
> >>>>>>>>>>>>> this:
> >>>>>>>>>>>>> For middle size clusters (until
300 nodes):
> >>>>>>>>>>>>> Processor: A dual quad-core 2.6
Ghz
> >>>>>>>>>>>>> RAM: 24 GB DDR3
> >>>>>>>>>>>>> Dual 1 Gb Ethernet NICs
> >>>>>>>>>>>>> a SAS drive controller
> >>>>>>>>>>>>> at least two SATA II drives in a
JBOD configuration
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The replication factor depends heavily
of the primary use of
> >>>>>>>>>>>>> your
> >>>>>>>>>>>>> cluster.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 11/26/2012 08:53 AM, David Charle
wrote:
> >>>>>>>>>>>>>> hi
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> what's the recommended nodes
for NN, hmaster and zk nodes
> for
> >>>>>>>>>>>>>> a
> >>>>>>>> larger
> >>>>>>>>>>>>>> cluster, lets say 50-100+
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> also, what would be the ideal
replication factor for larger
> >>>>>>>>>>>>>> clusters
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>> u have 3-4 racks ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> David
> >>>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION
DE LA UNIVERSIDAD DE LAS
> >>>>>>>>>>>>>> CIENCIAS
> >>>>>>>>>>>>>> INFORMATICAS...
> >>>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS
A LA REVOLUCION
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> http://www.uci.cu
> >>>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> >>>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Marcos Luis Ortíz Valmaseda
> >>>>>>>>>>>>> about.me/marcosortiz <http://about.me/marcosortiz>
> >>>>>>>>>>>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 10mo. ANIVERSARIO DE LA CREACION
DE LA UNIVERSIDAD DE LAS
> >>>>>>>>>>>>> CIENCIAS
> >>>>>>>>>>>>> INFORMATICAS...
> >>>>>>>>>>>>> CONECTADOS AL FUTURO, CONECTADOS
A LA REVOLUCION
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> http://www.uci.cu
> >>>>>>>>>>>>> http://www.facebook.com/universidad.uci
> >>>>>>>>>>>>> http://www.flickr.com/photos/universidad_uci
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Adrien Mogenet
> >>>>>>> 06.59.16.64.22
> >>>>>>> http://www.mogenet.me
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message