spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-17424) Dataset job fails from unsound substitution in ScalaReflect
Date Mon, 12 Sep 2016 17:31:22 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-17424:
------------------------------------

    Assignee: Apache Spark

> Dataset job fails from unsound substitution in ScalaReflect
> -----------------------------------------------------------
>
>                 Key: SPARK-17424
>                 URL: https://issues.apache.org/jira/browse/SPARK-17424
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1, 2.0.0
>            Reporter: Ryan Blue
>            Assignee: Apache Spark
>
> I have a job that uses datasets in 1.6.1 and is failing with this error:
> {code}
> 16/09/02 17:02:56 ERROR Driver ApplicationMaster: User class threw exception: java.lang.AssertionError:
assertion failed: Unsound substitution from List(type T, type U) to List()
> java.lang.AssertionError: assertion failed: Unsound substitution from List(type T, type
U) to List()
>     at scala.reflect.internal.Types$SubstMap.<init>(Types.scala:4644)
>     at scala.reflect.internal.Types$SubstTypeMap.<init>(Types.scala:4761)
>     at scala.reflect.internal.Types$Type.subst(Types.scala:796)
>     at scala.reflect.internal.Types$TypeApiImpl.substituteTypes(Types.scala:321)
>     at scala.reflect.internal.Types$TypeApiImpl.substituteTypes(Types.scala:298)
>     at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$getConstructorParameters$1.apply(ScalaReflection.scala:769)
>     at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$getConstructorParameters$1.apply(ScalaReflection.scala:768)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>     at org.apache.spark.sql.catalyst.ScalaReflection$class.getConstructorParameters(ScalaReflection.scala:768)
>     at org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameters(ScalaReflection.scala:30)
>     at org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameters(ScalaReflection.scala:610)
>     at org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$argNames$lzycompute(TreeNode.scala:418)
>     at org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$argNames(TreeNode.scala:418)
>     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argsMap$1.apply(TreeNode.scala:415)
>     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$argsMap$1.apply(TreeNode.scala:414)
>     at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.toMap(TraversableOnce.scala:279)
>     at scala.collection.AbstractIterator.toMap(Iterator.scala:1157)
>     at org.apache.spark.sql.catalyst.trees.TreeNode.argsMap(TreeNode.scala:416)
>     at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:46)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>     at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>     at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SparkPlanInfo$$anonfun$2.apply(SparkPlanInfo.scala:44)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>     at scala.collection.immutable.List.foreach(List.scala:318)
>     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>     at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:44)
>     at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:51)
>     at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
>     at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
>     at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:193)
>     at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:166)
>     at com.netflix.jobs.main(Processing.scala)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:557)
> {code}
> I think this is the same bug as SPARK-13067. It looks like that issue wasn't fixed, there
was just a work-around added to get the test passing.
> The problem is that the reflection code is trying to substitute concrete types for type
parameters of {{MapPartitions[T, U]}}, but the concrete types aren't known. So Spark ends
up calling {{substituteTypes}} to substitute {{T}} and {{U}} with {{Nil}} (which gets shown
as {{List()}}).
> An easy fix that works for me is this:
> {code:lang=scala}
>     // if there are type variables to fill in, do the substitution (SomeClass[T] ->
SomeClass[Int])
>     if (actualTypeArgs.nonEmpty) {
>       params.map { p =>
>         p.name.toString -> p.typeSignature.substituteTypes(formalTypeArgs, actualTypeArgs)
>       }
>     } else {
>       params.map { p =>
>         p.name.toString -> p.typeSignature
>       }
>     }
> {code}
> Does this sound like a reasonable solution?
> Edit: I think this affects 2.0.0 because the call to [{{substituteTypes}} is unchanged|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L788-L790]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message