spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6071) ALS doc example fails randomly in PythonAccumulatorParam
Date Wed, 04 Mar 2015 23:33:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347782#comment-14347782
] 

Joseph K. Bradley commented on SPARK-6071:
------------------------------------------

This is probably just a bug in Python accumulators and has nothing to do with ALS

> ALS doc example fails randomly in PythonAccumulatorParam
> --------------------------------------------------------
>
>                 Key: SPARK-6071
>                 URL: https://issues.apache.org/jira/browse/SPARK-6071
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib, PySpark
>    Affects Versions: 1.3.0
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> When running the ALS example in [http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples]
on branch-1.3, I got a random failure which I have been unable to reproduce.
> Specifically, I was running on the branch from this PR [https://github.com/apache/spark/pull/4811]
at this commit: [https://github.com/mengxr/spark/commit/06140a48ec5bd55b329e9b7cf658bd3e43be4fe2]
> However, that PR should not have affected the bug, so I suspect it is within branch-1.3
itself.
> After a clean build, I ran:
> {code}
> from pyspark.mllib.recommendation import ALS, Rating, MatrixFactorizationModel
> # Load and parse the data
> data = sc.textFile("data/mllib/als/test.data")
> ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]),
float(l[2])))
> # Build the recommendation model using Alternating Least Squares
> rank = 10
> numIterations = 20
> model = ALS.train(ratings, rank, numIterations)
> {code}
> And I got this error:
> {code}
> >>> model = ALS.train(ratings, rank, numIterations)
> 15/02/27 14:41:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
> 15/02/27 14:41:24 WARN LoadSnappy: Snappy native library not loaded
> 15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
> 15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
> 15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
> 15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
> 15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for ResultTask(279,
2)
> java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
> 	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
> 	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> 	at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
> 	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
> 	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for ResultTask(279,
4)
> java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
> 	at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
> 	at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
> 	at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
> 	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> 	at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> 	at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> 	at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> 	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> 	at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
> 	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
> 	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> However, re-running the same train() call immediately worked, and I have not yet been
able to reproduce the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message