mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.
Date Sat, 25 Jun 2011 06:26:08 GMT
There's that. There's also the fact that a 32-way machine almost certainly
doesn't have 32 times the I/O bandwidth, let alone 32 times faster seek
latency. (That is, it doesn't have 32 disks.) For a lof these kinds of jobs
you could end up with an I/O bottleneck.

Speaking of AWS and EMR, I find that I/O bottleneck is by far the issue
there. I spread my jobs there as far across instances and racks as possible
just to try to steal more little machine's I/O seeks!

On Sat, Jun 25, 2011 at 3:17 AM, edwin <edwintchiu@gmail.com> wrote:

> Hi Ted,
> I'm wondering for "isn't going to work well", you refer to inevitable
> unnecessary hadoop overhead running on a single machine or there are other
> implications to run big jobs on a single machine?
>
> - edwin
>
> On Jun 24, 2011, at 7:11 PM, Ted Dunning wrote:
>
> > I have done this with VM's but I would not generally recommend it.
>  Without
> > VM's you will have a pretty ugly configuration issue because Hadoop
> usually
> > assumes it owns the machine.
> >
> > Besides, this is a seriously square peg into a round hole kind of problem
> > here.  Hadoop (map-reduce) was designed so that you could use several
> little
> > machines instead of one big one.  It just isn't going to work well on a
> > single computer.
> >
> > On Fri, Jun 24, 2011 at 6:49 PM, XiaoboGu <guxiaobo1982@gmail.com>
> wrote:
> >
> >> Do you have any experience  in running multiple data nodes and task
> >> trackers on a single SMP server.
> >>
> >>> -----Original Message-----
> >>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> >>> Sent: Saturday, June 25, 2011 9:26 AM
> >>> To: user@mahout.apache.org
> >>> Cc: dev@mahout.apache.org
> >>> Subject: Re: Can all the algorithms in Mahout be run locally without a
> >> Hadoop cluster.
> >>>
> >>> Pretty big.  SHould scream for local classifier learning.
> >>>
> >>> Local Hadoop should run pretty fast as well.
> >>>
> >>> On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu <guxiaobo1982@gmail.com>
> >> wrote:
> >>>
> >>>> 32Core, 256G RAM
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> >>>>> Sent: Saturday, June 25, 2011 1:37 AM
> >>>>> To: user@mahout.apache.org
> >>>>> Cc: dev@mahout.apache.org
> >>>>> Subject: Re: Can all the algorithms in Mahout be run locally without
> >> a
> >>>> Hadoop cluster.
> >>>>>
> >>>>> Big iron is fine for some of the classifier stuff, but throughput
per
> >> $
> >>>> can
> >>>>> be higher for other algorithms with a cluster of smaller machines.
> >>>>>
> >>>>> How big a machine are you talking about?  Even relatively small
> >> machines
> >>>> are
> >>>>> pretty massive any more.  8 core = 16 hyper-thread machines with
48GB
> >>>> seem
> >>>>> to be not even very impressive any more.
> >>>>>
> >>>>> On Fri, Jun 24, 2011 at 1:47 AM, XiaoboGu <guxiaobo1982@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> We will put a big SMP server to deploy Mahout.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Xiaobo Gu
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message