spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1
Date Wed, 06 Aug 2014 00:59:48 GMT
Hi Xiangrui,

I used your idea and kept a cherry picked version of ALS.scala in my
application and call it ALSQp.scala...this is a OK workaround for now till
a version adds up to master for example...

For the bug with userClassPathFirst, looks like Koert already found this
issue in the following JIRA:

https://issues.apache.org/jira/browse/SPARK-1863

By the way the userClassPathFirst feature is very useful since I am sure
the deployed version of spark on a production cluster will always be the
last stable (core at 1.0.1 in my case) and people would like to deploy
SNAPSHOT versions of libraries that build on top of spark core (mllib,
streaming etc)...

Another way is to have a build option that deploys only the core and not
the libraries built upon core...

Do we have an option like that in make-distribution script ?

Thanks.
Deb


On Tue, Aug 5, 2014 at 10:37 AM, Xiangrui Meng <mengxr@gmail.com> wrote:

> If you cannot change the Spark jar deployed on the cluster, an easy
> solution would be renaming ALS in your jar. If userClassPathFirst
> doesn't work, could you create a JIRA and attach the log? Thanks!
> -Xiangrui
>
> On Tue, Aug 5, 2014 at 9:10 AM, Debasish Das <debasish.das83@gmail.com>
> wrote:
> > I created the assembly file but still it wants to pick the mllib from the
> > cluster:
> >
> > jar tf ./target/ml-0.0.1-SNAPSHOT-jar-with-dependencies.jar | grep
> > QuadraticMinimizer
> >
> > org/apache/spark/mllib/optimization/QuadraticMinimizer$$anon$1.class
> >
> > /Users/v606014/dist-1.0.1/bin/spark-submit --master
> > spark://TUSCA09LMLVT00C.local:7077 --class ALSDriver
> > ./target/ml-0.0.1-SNAPSHOT-jar-with-dependencies.jar inputPath outputPath
> >
> > Exception in thread "main" java.lang.NoSuchMethodError:
> >
> org.apache.spark.mllib.recommendation.ALS.setLambdaL1(D)Lorg/apache/spark/mllib/recommendation/ALS;
> >
> > Now if I force it to use the jar that I gave using
> > spark.files.userClassPathFirst, then it fails on some serialization
> > issues...
> >
> > A simple solution is to cherry pick the files I need from spark branch to
> > the application branch but I am not sure that's the right thing to do...
> >
> > The way userClassPathFirst is behaving, there might be bugs in it...
> >
> > Any suggestions will be appreciated....
> >
> > Thanks.
> > Deb
> >
> >
> > On Sat, Aug 2, 2014 at 11:12 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
> >>
> >> Yes, that should work. spark-mllib-1.1.0 should be compatible with
> >> spark-core-1.0.1.
> >>
> >> On Sat, Aug 2, 2014 at 10:54 AM, Debasish Das <debasish.das83@gmail.com
> >
> >> wrote:
> >> > Let me try it...
> >> >
> >> > Will this be fixed if I generate a assembly file with mllib-1.1.0
> >> > SNAPSHOT
> >> > jar and other dependencies with the rest of the application code ?
> >> >
> >> >
> >> >
> >> > On Sat, Aug 2, 2014 at 10:46 AM, Xiangrui Meng <mengxr@gmail.com>
> wrote:
> >> >>
> >> >> You can try enabling "spark.files.userClassPathFirst". But I'm not
> >> >> sure whether it could solve your problem. -Xiangrui
> >> >>
> >> >> On Sat, Aug 2, 2014 at 10:13 AM, Debasish Das
> >> >> <debasish.das83@gmail.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I have deployed spark stable 1.0.1 on the cluster but I have new
> code
> >> >> > that
> >> >> > I added in mllib-1.1.0-SNAPSHOT.
> >> >> >
> >> >> > I am trying to access the new code using spark-submit as follows:
> >> >> >
> >> >> > spark-job --class com.verizon.bda.mllib.recommendation.ALSDriver
> >> >> > --executor-memory 16g --total-executor-cores 16 --jars
> >> >> > spark-mllib_2.10-1.1.0-SNAPSHOT.jar,scopt_2.10-3.2.0.jar
> >> >> > sag-core-0.0.1-SNAPSHOT.jar --rank 25 --numIterations 10 --lambda
> 1.0
> >> >> > --qpProblem 2 inputPath outputPath
> >> >> >
> >> >> > I can see the jars are getting added to httpServer as expected:
> >> >> >
> >> >> > 14/08/02 12:50:04 INFO SparkContext: Added JAR
> >> >> > file:/vzhome/v606014/spark-glm/spark-mllib_2.10-1.1.0-SNAPSHOT.jar
> at
> >> >> > http://10.145.84.20:37798/jars/spark-mllib_2.10-1.1.0-SNAPSHOT.jar
> >> >> > with
> >> >> > timestamp 1406998204236
> >> >> >
> >> >> > 14/08/02 12:50:04 INFO SparkContext: Added JAR
> >> >> > file:/vzhome/v606014/spark-glm/scopt_2.10-3.2.0.jar at
> >> >> > http://10.145.84.20:37798/jars/scopt_2.10-3.2.0.jar with timestamp
> >> >> > 1406998204237
> >> >> >
> >> >> > 14/08/02 12:50:04 INFO SparkContext: Added JAR
> >> >> > file:/vzhome/v606014/spark-glm/sag-core-0.0.1-SNAPSHOT.jar at
> >> >> > http://10.145.84.20:37798/jars/sag-core-0.0.1-SNAPSHOT.jar with
> >> >> > timestamp
> >> >> > 1406998204238
> >> >> >
> >> >> > But the job still can't access code form mllib-1.1.0
> SNAPSHOT.jar...I
> >> >> > think
> >> >> > it's picking up the mllib from cluster which is at 1.0.1...
> >> >> >
> >> >> > Please help. I will ask for a PR tomorrow but internally we want
to
> >> >> > generate results from the new code.
> >> >> >
> >> >> > Thanks.
> >> >> >
> >> >> > Deb
> >> >
> >> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message