spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <nicolas.fou...@gmail.com>
Subject "Ambiguous references" to a field set in a partitioned table AND the data
Date Tue, 31 Mar 2015 14:06:47 GMT
  Hi,


I save Parquet files in a partitioned table, so in /path/to/table/myfield=a/ .
But I also kept the field "myfield" in the Parquet data. Thus. when I query the field, I get
this error:


df.select("myfield").show(10)
"Exception in thread "main" org.apache.spark.sql.AnalysisException: Ambiguous references to myfield 
(myfield#2,List()),(myfield#47,List());"


Looking at the code, I could not find a way to explicitly specify which column I'd want. DataFrame#columns
returns strings. Even by loading the data with a schema (StructType), I'm not sure I can do
it.


Should I have to make sure that my partition field does not exist in the data before saving
? Or is there a way to declare what column in the schema I want to query ?


Thanks.





Mime
View raw message