mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Interesting MapReduce variant: MapFreeduce
Date Mon, 16 May 2011 00:09:09 GMT
Most of our jobs are I/O bound anyway and it is common for the switch fabric
connecting desktops to be pretty limited.  My guess is that you would get
very limited increase in total computing progress by these means.  There are
a few notable examples like protein folding where the problems require small
input and output and massive compute time, but very few distributed machine
learning algorithms are like that.

On Sun, May 15, 2011 at 10:30 AM, Jeremy Lewi <> wrote:

> If you're running in an applet without hdfs, doesn't that mean "your
> moving both data and computation to the machine" as opposed to moving
> "computation to the data?". Would this be a big issue for mahout? For
> example,  if you're running kmeans and 90% of your machines are
> workstations that would otherwise be idle, then wouldn't you need to
> transfer roughly 90% of your dataset to the various clients (e.g client
> might only receive a small fraction but you 90% needs to be shipped out
> of your central storage)? It seems like network bottlenecks could easily
> swamp the benefits of using workstation cycles.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message