mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Goel <ankitgoel2...@gmail.com>
Subject Re: Mahout on the cloud
Date Sat, 25 Jul 2015 02:28:30 GMT
very true. java has a longer history and enough resources and ides. thanks
for this bit of information, and ofcourse like Pat mentions, even if mahout
is a scala project, we will still find java apis to work with. I'm just
reiterating this point for anyone else who might come across this thread
while looking for similar answers.

On Fri, Jul 24, 2015 at 10:05 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> For the foreseeable future we are a Scala project but like Spark itself
> Java APIs can often be created for Scala given the right API design and if
> someone wants to contribute in this area it would be seen favorably I
> think. Java knowledge still far easier to find than Scala.
>
> On Jul 23, 2015, at 2:52 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
> PPS. one of "better" backends, if there any comparison really is
> appropriate, is expected to be Apache Flink.
>
> On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
>
> > i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> > smaller part of it, may include specific backend engine dependencies
> > unfortunately. it should be easily portable though.
> >
> >
> > On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> >
> >> Mahout is moving to be backend-agnostic. Supports same code on spark or
> >> h20.
> >>
> >> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I
> think
> >> some co-occurrence drivers also like Spark more than anything else. may
> be
> >> wrong.)
> >>
> >>
> >> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <ankitgoel2004@gmail.com>
> >> wrote:
> >>
> >>> Thanks a lot guys.
> >>> @Pat is mahout only going to support scala in the near future? and will
> >>> all
> >>> the ml libraries only be from spark? I did read somewhere that mahout
> was
> >>> heading towards a direction where its more of a framework that supports
> >>> multiple ml libraries. Am I right in my understanding?
> >>>
> >>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pat@occamsmachete.com>
> >>> wrote:
> >>>
> >>>> Just to be clear, mahout runs on AWS just fine. Dmitriy is talking
> >>> about
> >>>> support and continuance of “MapReduce” which means Hadoop MapReduce.
> We
> >>>> have been exclusively accepting only more modern engine code for more
> >>> than
> >>>> a year so most of the modern Mahout is in Scala and runs on Spark. The
> >>>> MapReduce paradigm is certainly supported there but it runs on Spark
> >>> so any
> >>>> EMR instances you create should have Spark installed.
> >>>>
> >>>> Amazon now supports Spark on EMR:
> >>>> https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
> >>>>
> >>>> Make sure you use the correct version of Spark with Mahout. 0.10.0
> >>>> supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
> >>> less,
> >>>> the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
> >>>>
> >>>> On Jul 23, 2015, at 7:28 AM, Ankit Goel <ankitgoel2004@gmail.com>
> >>> wrote:
> >>>>
> >>>> Thanks for the heads up Dmitriy..thats exactly the kind of warning I
> >>> was
> >>>> looking for. I dont have any experience implementing MR yet --i
> >>> understand
> >>>> the algo perfectly-- so this is a great heads up. Any advice oor
> >>> warnings
> >>>> on hadoop installations and versions??
> >>>>
> >>>> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> MapReduce things enter de-facto end-of-life. Not that we specifically
> >>>> don't
> >>>>> want to support them, it is de-facto nobody bothers to support them
> >>> --
> >>>>> especially risks are high with new versions of hadoop and EMR.
> >>>>>
> >>>>> That said, we'd be grateful for any guide about doing this in EMR.
> >>>>>
> >>>>> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>> After my runs on my lappy, I'm ready to port my work to the
cloud.
> >>>>> Planning
> >>>>>> to use Amazon. One thing I noticed when I started with mahout,
that
> >>>> there
> >>>>>> were a lot of things unsaid on the site/wiki and took me a lot
of
> >>> time
> >>>> to
> >>>>>> figure out. Pitfalls if I may call them. I will primarily be
using
> >>>>>> clustering on the cloud, so the code to accept new data and
run it
> >>> is
> >>>>> what
> >>>>>> I have for now.
> >>>>>>
> >>>>>> So before I port to the cloud, are there any things I should
beware
> >>> of
> >>>> or
> >>>>>> lookout for? Like is AWS fine with mahout? Are there any
> >>> configurations
> >>>> I
> >>>>>> should remember? Any advice on implementation to ease my transition
> >>> and
> >>>>> run
> >>>>>> mahout 24hrs? Thanks
> >>>>>>
> >>>>>> --
> >>>>>> Regards,
> >>>>>> Ankit Goel
> >>>>>> http://about.me/ankitgoel
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Ankit Goel
> >>>> http://about.me/ankitgoel
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Ankit Goel
> >>> http://about.me/ankitgoel
> >>>
> >>
> >>
> >
>
>


-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message