spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5462) Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error when accessing fields in DataFrames returned from sqlCtx.sql()
Date Fri, 30 Jan 2015 00:30:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297956#comment-14297956
] 

Apache Spark commented on SPARK-5462:
-------------------------------------

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/4282

> Catalyst UnresolvedException "Invalid call to qualifiers on unresolved object" error
when accessing fields in DataFrames returned from sqlCtx.sql()
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5462
>                 URL: https://issues.apache.org/jira/browse/SPARK-5462
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: Josh Rosen
>            Priority: Blocker
>
> When trying to access fields on a Python DataFrame created via inferSchema, I ran into
a confusing Catalyst Py4J error.  Here's a reproduction:
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext, Row
> sc = SparkContext("local", "test")
> sqlContext = SQLContext(sc)
> # Load a text file and convert each line to a Row.
> lines = sc.textFile("examples/src/main/resources/people.txt")
> parts = lines.map(lambda l: l.split(","))
> people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
> # Infer the schema, and register the SchemaRDD as a table.
> schemaPeople = sqlContext.inferSchema(people)
> schemaPeople.registerTempTable("people")
> # SQL can be run over SchemaRDDs that have been registered as a table.
> teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <=
19")
> print teenagers.name
> {code}
> This fails with the following error:
> {code}
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/spark/sqltest.py", line 19, in <module>
>     print teenagers.name
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/sql.py", line 2154, in __getattr__
>     return Column(self._jdf.apply(name))
>   File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
>   File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o66.apply.
> : org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to qualifiers
on unresolved object, tree: 'name
> 	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:50)
> 	at org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:46)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:143)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:140)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:140)
> 	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:126)
> 	at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:122)
> 	at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:237)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> 	at py4j.Gateway.invoke(Gateway.java:259)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> This is distinct from the helpful error message that I get when trying to access a non-existent
column.  This error didn't occur when I tried the same thing with a DataFrame created via
jsonRDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message