spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <>
Subject "Ambiguous references" to a field set in a partitioned table AND the data
Date Tue, 31 Mar 2015 14:06:47 GMT

I save Parquet files in a partitioned table, so in /path/to/table/myfield=a/ .
But I also kept the field "myfield" in the Parquet data. Thus. when I query the field, I get
this error:"myfield").show(10)
"Exception in thread "main" org.apache.spark.sql.AnalysisException: Ambiguous references to myfield 

Looking at the code, I could not find a way to explicitly specify which column I'd want. DataFrame#columns
returns strings. Even by loading the data with a schema (StructType), I'm not sure I can do

Should I have to make sure that my partition field does not exist in the data before saving
? Or is there a way to declare what column in the schema I want to query ?


View raw message