mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Müller <Michael.Muel...@condat.de>
Subject AW: 0.13.0-RC not fully compatible with Spark 1.6.3?
Date Thu, 23 Mar 2017 08:52:11 GMT
RC3 works for me now. :)


-----Ursprüngliche Nachricht-----
Von: Pat Ferrel [mailto:pat@occamsmachete.com] 
Gesendet: Montag, 6. März 2017 17:32
An: user@mahout.apache.org
Cc: Michael Müller <Michael.Mueller@condat.de>
Betreff: Re: 0.13.0-RC not fully compatible with Spark 1.6.3?

Thanks for finding this.

It appears to be because the jar passed to Spark with classes to be serialized was not updated
when some code was refactored. We have a fix under test that will be in the next RC. If you
could test the next RC (maybe ready tomorrow) we’d be very grateful.


On Mar 3, 2017, at 12:58 PM, Michael Müller <Michael.Mueller@condat.de> wrote:

> So you are downloading the binary and running the Mahout spark-itemsimilarity driver
from that binary?

yes


> You say “using the same Spark cluster” How is this setup, an env var like MASTER=?
> Can you supply you you point to the cluster and your CLI for the job?


These are my environment settings for Spark and Mahout:

export MAHOUT_HOME=/home/aml/mahout/apache-mahout-distribution-0.13.0
#export MAHOUT_LOCAL=true
export SPARK_HOME=/home/aml/spark/spark-1.6.3-bin-hadoop2.6
export MASTER=spark://ubuntu:7077
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/jre

I'm starting the job like this:

/home/aml/mahout/apache-mahout-distribution-0.13.0/bin/mahout spark-itemsimilarity --master
spark://ubuntu:7077 --input ~/data/rating_200k.csv --output ~/data/rating_200k_output --itemIDColumn
1 --rowIDColumn 0 --sparkExecutorMem 6g



And when i change MAHOUT_HOME to point to my Mahout 0.12.2 installation (-> /home/aml/mahout/apache-mahout-distribution-0.12.2)
and then start the job like that, it succeeds:

/home/aml/mahout/apache-mahout-distribution-0.12.2/bin/mahout spark-itemsimilarity --master
spark://ubuntu:7077 --input ~/data/rating_200k.csv --output ~/data/rating_200k_output --itemIDColumn
1 --rowIDColumn 0 --sparkExecutorMem 6g




-----Ursprüngliche Nachricht-----
Von: Pat Ferrel [mailto:pat@occamsmachete.com]
Gesendet: Freitag, 3. März 2017 20:49
An: Michael Müller
Cc: user@mahout.apache.org
Betreff: Re: 0.13.0-RC not fully compatible with Spark 1.6.3?

Thanks, I’ll see if I can reproduce. 

So you are downloading the binary and running the Mahout spark-itemsimilarity driver from
that binary? You say “using the same Spark cluster” How is this setup, an env var like
MASTER=? Can you supply you you point to the cluster and your CLI for the job?



On Mar 3, 2017, at 1:26 AM, Michael Müller <Michael.Mueller@condat.de> wrote:

Hi all,

is Mahout 0.13.0 supposed to work with Spark 1.6.3? I would think so as the master-pom.xml
explicitly references Spark 1.6.3.
But when I run a spark-itemsimilarity command (on the 0.13.0-RC) against my Spark 1.6.3-standalone
cluster, the command fails with:

17/03/03 10:08:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, reco-master): java.io.IOException:
org.apache.spark.SparkException: Failed to register classes with Kryo
	at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
	at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
...
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
	at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
	at scala.Option.map(Option.scala:145)
	at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123)

When I run the exactly same command on the 0.12.2 release distribution against the same Spark
cluster, the command completes sucessfully.

My Environment is:
* Ubuntu 14.04
* Oracle-JDK 1.8.0_121
* Spark standalone cluster using this distribution: http://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
* Mahout 0.13.0-RC: https://repository.apache.org/content/repositories/orgapachemahout-1034/org/apache/mahout/apache-mahout-distribution/0.13.0/apache-mahout-distribution-0.13.0.tar.gz


TIA

--
Michael Müller
Condat AG, Berlin




Mime
View raw message