mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Upgrade to Spark 1.1.0?
Date Mon, 20 Oct 2014 18:05:26 GMT
On Mon, Oct 20, 2014 at 10:49 AM, Pat Ferrel <pat@occamsmachete.com> wrote:

> I agree it’s just that different classes, required by mahout are missing
> from the environment depending on what happens to be in Spark. These deps
> should be supplied in the job.jar assemblies, right?
>

No. They should be physically available as jars, somewhere. E.g. in
compiled mahout tree.

the "job.xml" assembly in the "spark" module is but a left over from an
experiment i ran on job jars with Spark long ago. It's just hanging around
there but not actually being built. Sorry for confusion. DRM doesn't use
job jars. As far as I have established, Spark does not understand job jars
(it's purely a Hadoop notion -- but even there it has been unsupported or
depricated for a long time now).

So. we can e.g. create a new assembly for spark, such as "optional
dependencies" jars, and put it somewhere into the compiled tree. (I guess
similar to "managed libraries" notion in SBT.).

Then, if you need any of those, your driver code needs to do the following.
The mahoutSparkContext() method accepts optional SparkConf parameter.
Additional jars could be added to SparkConf before passing on to
mahoutSparkContext. If you don't supply SparkConf, the method will create
default one. If you do, it will merge all mahout specific settings and
standard jars to the context information you supply.

As far as i see, by default context includes only math, math-scala, spark
and mrlegacy jars. No third party jars. (line 212 in sparkbindings
package). The test that checks that is in SparkBindingsSuite.scala. (yes
you are correct, the one you mentioned.)






>
> Trying out the
>   test("context jars") {
>   }
>
> findMahoutContextJars(closeables) gets the .jars, and seems to explicitly
> filter out the job.jars. The job.jars include needed dependencies so for a
> clustered environment shouldn’t these be the only ones used?
>
>
> On Oct 20, 2014, at 10:39 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
> either way i don't believe there's something specific to 1.0.1, 1.0.2 or
> 1.1.0 that is causing/not causing classpath errors. it's just jars are
> picked by explicitly hardcoded artifact "opt-in" policy, not the other way
> around.
>
> It is not enough just to modify pom in order for something to appear in
> task classpath.
>
> On Mon, Oct 20, 2014 at 9:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
>
> > Note that classpaths for "cluster" environment is tested trivially by
> > starting 1-2 workers and standalone spark manager processes locally. No
> > need to build anything "real". Workers would not know anything about
> mahout
> > so unless proper jars are exposed in context, they would have no way of
> > "faking" the access to classes.
> >
> > On Mon, Oct 20, 2014 at 9:28 AM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
> >
> >> Yes, asap.
> >>
> >> To test this right it has to run on a cluster so I’m upgrading. When
> >> ready it will just be a “mvn clean install" if you already have Spark
> 1.1.0
> >> running.
> >>
> >> I would have only expected errors on the CLI drivers so if anyone else
> >> sees runtime errors please let us know. Some errors are very hard to
> unit
> >> test since the environment is different for local(unit tests) and
> cluster
> >> execution.
> >>
> >>
> >> On Oct 20, 2014, at 9:14 AM, Mahesh Balija <balijamahesh.mca@gmail.com>
> >> wrote:
> >>
> >> Hi Pat,
> >>
> >> Can you please give detailed steps to build Mahout against Spark 1.1.0.
> >> I build against 1.1.0 but still had class not found errors, thats why I
> >> reverted back to Spark 1.0.2 even though first few steps are successful
> >> but still facing some issues in running Mahout spark-shell sample
> commands
> >> (drmData) throws some errors even on 1.0.2.
> >>
> >> Best,
> >> Mahesh.B.
> >>
> >> On Mon, Oct 20, 2014 at 1:46 AM, peng <pc175@uowmail.edu.au> wrote:
> >>
> >>> From my experience 1.1.0 is quite stable, plus some performance
> >>> improvements that totally worth the effort.
> >>>
> >>>
> >>> On 10/19/2014 06:30 PM, Ted Dunning wrote:
> >>>
> >>>> On Sun, Oct 19, 2014 at 1:49 PM, Pat Ferrel <pat@occamsmachete.com>
> >>>> wrote:
> >>>>
> >>>> Getting off the dubious Spark 1.0.1 version is turning out to be a bit
> >> of
> >>>>> work. Does anyone object to upgrading our Spark dependency? I’m
not
> >> sure
> >>>>> if
> >>>>> Mahout built for Spark 1.1.0 will run on 1.0.1 so it may mean
> >> upgrading
> >>>>> your Spark cluster.
> >>>>>
> >>>>
> >>>> It is going to have to happen sooner or later.
> >>>>
> >>>> Sooner may actually be less total pain.
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message