mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Removing MAHOUT_LOCAL option
Date Mon, 21 Mar 2016 20:47:17 GMT
stochastic svd in DSSVD.scala is identical to MR with exception that MR
frankly is using a more numerically stable reordered Givens QR, while the
DSSVD.scala uses a less numerically stable Cholesky QR.

Aside from that, the DrmLike input parameter is fully compatible with hdfs
sequence file input for the MR version.

in Samsara the code would be (I am writing from memory and hopefully spell
everything right)

<imports, implicits omited...>

val drmX = drmDfsRead(path=<hdfs-path>)
val (drmU, drmV, s) = dssvd(drmX, k=..., q=..., ...)  // whatever
paremeters you normally use here

This should do it.
of course you'd run into significant infrastructure migration if you
currently do not have H20 or Spark available and spinning somewhere already.

-d

On Mon, Mar 21, 2016 at 12:57 PM, Mihai Dascalu <mihai.dascalu@cs.pub.ro>
wrote:

> We still have a legacy code that uses for a Stochastic SVD the local
> HADOOP instance directly in a Java desktop application. But if the desire
> is to eliminate it, we’ve been inclining for a while to migrate everything
> to Spark.
>
> Sorry, I’m old school and use MR, plus I’m new to Spark :) Is there an
> easy way to migrate your Spark example into the Java source code so that we
> do not disrupt the overall flow?
>
>
> Have a great evening!
> Mihai
>
> > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> >
> > my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> > packaging. as long as MR is still here (and I would say it needs to be
> > still here, unless it falls in complete disrepair and totally out of sync
> > with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as
> MR
> > goes, it goes too.
> >
> > maybe we just simply need a separate mahout script for non-legacy things,
> > or factor out legacy related shell things into another script (something
> > like mahout-mr.sh instead of mahout.sh)
> >
> > On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi <smarthi@apache.org>
> wrote:
> >
> >> Some background on this issue:
> >>
> >> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and
> Flink
> >> coming soon in 0.12.0, its been bloating the size of our release
> artifacts
> >> when pushing releases to Apache mirrors. Hence we were looking at
> pruning
> >> some of the components that have not been used or have been long marked
> >> deprecated and are not being worked on.
> >>
> >> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
> >> the MiA book even for legacy MapReduce.  Not sure if that's indeed
> helping
> >> onboard new users.
> >>
> >> 3.  Seems like the consensus so far based on the user responses is to
> >> retain the MAHOUT_LOCAL the option, thanks all for your responses.
> >>
> >>
> >> On Mon, Mar 21, 2016 at 11:38 AM, scott cote <scottccote@gmail.com>
> wrote:
> >>
> >>> one more comment - I understand that it only works for the legacy code.
> >>> Kill it when the legacy code is no longer deprecated, but gone ….
> >>>
> >>> Otherwise - you will shut out people who buy the older mahout books
> (such
> >>> as MIA) which are still good reads, even though the tech is dated.
> >>>
> >>> SCott
> >>>
> >>>> On Mar 21, 2016, at 2:24 AM, David Starina <david.starina@gmail.com>
> >>> wrote:
> >>>>
> >>>> Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> >>>> MapReduce-based code still makes sense if it is running well on
> Ignite.
> >>>>
> >>>> On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
> >> david.starina@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Has anyone tried to run the deprecated MapReduce code on Ignite?
Is
> >> the
> >>>>> performance improvement good enough to reconsider leaving those
> >>> algorithms
> >>>>> in Mahout?
> >>>>>
> >>>>> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> >>>>> andrew.musselman@gmail.com> wrote:
> >>>>>
> >>>>>> Yes I agree; will leave the question open a couple days.
> >>>>>>
> >>>>>> On Sunday, March 20, 2016, Pat Ferrel <pat@occamsmachete.com>
> wrote:
> >>>>>>
> >>>>>>> Maybe a better user question is: How many people are still
using
> the
> >>>>>>> deprecated Hadoop code?
> >>>>>>>
> >>>>>>> If the number is small +1 for removal.
> >>>>>>>
> >>>>>>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> >>>>>> andrew.musselman@gmail.com
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>
> >>>>>>> To clarify, the MAHOUT_LOCAL option only works for legacy
Hadoop
> >>>>>>> MapReduce-based jobs which officially became deprecated
in 0.10.0.
> >>>>>>>
> >>>>>>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> >>>>>>> andrew.musselman@gmail.com <javascript:;>> wrote:
> >>>>>>>
> >>>>>>>> Yes as I understand it.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sunday, March 20, 2016, Pat Ferrel <pat@occamsmachete.com
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>
> >>>>>>>>> Are we just talking about Hadoop Mapreduce? I thought
is was
> >> ignored
> >>>>>>> when
> >>>>>>>>> using Spark.
> >>>>>>>>>
> >>>>>>>>> On Mar 20, 2016, at 8:20 AM, alok tanna <tannaalok@gmail.com
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>> -1 MAHOUT_LOCAL  is very useful for quick POC .
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Alok Tanna
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>>
> >>>>>>>>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
> >>> mihai.dascalu@cs.pub.ro
> >>>>>>> <javascript:;>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> -1 I still use it for fast deployment and it’s
really helpful
> for
> >>>>>> small
> >>>>>>>>> local processing
> >>>>>>>>>>
> >>>>>>>>>> Have a great weekend!
> >>>>>>>>>> Mihai
> >>>>>>>>>>
> >>>>>>>>>>> On 20 Mar 2016, at 06:13, Suneel Marthi
<
> >> suneel.marthi@gmail.com
> >>>>>>> <javascript:;>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> +1 to remove this
> >>>>>>>>>>>
> >>>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>>
> >>>>>>>>>>>> On Mar 20, 2016, at 12:01 AM, Andrew
Musselman <
> >>>>>>>>> andrew.musselman@gmail.com <javascript:;>>
wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> We're discussing removing the MAHOUT_LOCAL
option in order to
> >>> trim
> >>>>>>>>> artifact
> >>>>>>>>>>>> sizes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you think keeping the option to use
MAHOUT_LOCAL for
> testing
> >>>>>> with
> >>>>>>>>> the
> >>>>>>>>>>>> single-node mode of Hadoop is important
please let us know. It
> >>>>>> can be
> >>>>>>>>> handy
> >>>>>>>>>>>> for trying things out but it would be
nice to ditch the effort
> >>>>>>>>> required to
> >>>>>>>>>>>> maintain it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> See https://issues.apache.org/jira/browse/MAHOUT-1705
for
> more
> >>>>>>>>> context.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message