I would like to endorse this point.
If your sparse data fits in memory on a single machine, it is very unlikely
that you will be able to improve on the cost of doing a stochastic
projection on that one machine using any Hadoop based solution.
Even with MPI and crazy RDMA networking, I doubt that you would beat it by
much, if any.
On Wed, Aug 1, 2012 at 12:36 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> also as Lance mentioned, usually "coefficient of performance" per core
> for distributed methods is lower than that of an iterative method. It
> is hard (if even possible) to achieve 100% scalability here. Simply
> put, if you have 5 computers to solve same problem, it will not be
> solved 5 times faster than a comparable method on a single computer.
>
> On Wed, Aug 1, 2012 at 11:29 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> > I only know comparisons of parallel algorithms only. There's
> > performance and accuracy comparison between Mahout's SSVD and Lanczos
> > done in dissertation of N. Halko (see link at SSVD page on Mahout
> > wiki). There's also a "Heigen" SVD paper that discusses distributed
> > modified Lanczos method of a proprietary Hadoopbased implemetnation
> > at Yahoo. Even though it doesn't draw sidebyside comparisons, it
> > does present benchmark figures for the Heigen implementation so one
> > can approximately draw comparisons between Heigen and Mahout methods.
> >
> > w.r.t to parallel vs. nonparallel, IMO the bottom line is
> > practicality, not necessarily speed. There are some SVD problems that
> > one might argue that single computer solution is not practical and
> > which a distributed algorithm may actually shift into realm of
> > practical solutions. (in a sense that you don't need days to solve
> > it). But IMO direct comparison still doesn't make a lot of sense.
> >
> > On Sat, Jul 28, 2012 at 9:27 AM, mohsen jadidi <mohsen.jadidi@gmail.com>
> wrote:
> >> Thank you for your replies. What I am interested to know is that if I
> want
> >> to compute the SVD for huge matrix , how much faster my computation get
> by
> >> using Mahout.
> >>
> >> On Fri, Jul 27, 2012 at 8:12 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> >>
> >>> IMO it doesn't make much sense to compare nonparallel and a parallel
> >>> algorithm (assuming they are running approximately same flopssized
> >>> computation). Which is probably why there's not so many (i don't know
> >>> any).
> >>>
> >>> However, there are studies comparing parallel approaches (e.g. certain
> >>> mahout vs. giraph methods) given same amount of flops capacity in a
> >>> cluster, but i think you need to be more specific because there are
> >>> too many areas of interest you are talking about.
> >>>
> >>> On Fri, Jul 27, 2012 at 8:57 AM, mohsen jadidi <
> mohsen.jadidi@gmail.com>
> >>> wrote:
> >>> > Hey all,
> >>> >
> >>> > I am looking for some case studies which has evaluated some of
> Mahout
> >>> > algorithm implementation like different decomposition or different
> >>> > classifier. I just want to know how much faster is the Mahout in
> compare
> >>> of
> >>> > regular non. paralleled algorithms.I couldnt find anything useful.
> >>> >
> >>> > Thanks in advance,
> >>> >
> >>> > 
> >>> > Mohsen Jadidi
> >>>
> >>
> >>
> >>
> >> 
> >> Mohsen Jadidi
>
