spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Nastetsky <alex.nastet...@verve.com>
Subject "where" clause able to access fields not in its schema
Date Wed, 13 Feb 2019 22:32:31 GMT
I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code.

The "b" dataframe does not have field "bar" in its schema, but is still able to filter on
that field.

scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar")
a: org.apache.spark.sql.DataFrame = [foo: int, bar: int]

scala> a.show
+---+---+
|foo|bar|
+---+---+
|  1| 10|
|  2| 20|
+---+---+

scala> val b = a.select($"foo")
b: org.apache.spark.sql.DataFrame = [foo: int]

scala> b.schema
res3: org.apache.spark.sql.types.StructType = StructType(StructField(foo,IntegerType,false))

scala> b.select($"bar").show
org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given input columns: [foo];;
[...snip...]

scala> b.where($"bar" === 20).show
+---+
|foo|
+---+
|  2|
+---+

Mime
View raw message