spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject Re: DataFrame select non-existing column
Date Sat, 19 Nov 2016 14:56:58 GMT
Thanks. Here's my code example [1] and the printSchema() output [2].

This code still fails with the following message: "No such struct
field mobile in auction, geo"

By looking at the schema, it seems that pass.mobile did not get
nested, which is the way it needs to be for my use case. Is nested
columns not supported by withColumn()?

[1]

DataFrame df = ctx.read().parquet(localPath).withColumn("pass.mobile", lit(0L));
dataFrame.printSchema();
dataFrame.select("pass.mobile");

[2]

root
 |-- pass: struct (nullable = true)
 |    |-- auction: struct (nullable = true)
 |    |    |-- id: integer (nullable = true)
 |    |-- geo: struct (nullable = true)
 |    |    |-- postalCode: string (nullable = true)
 |-- pass.mobile: long (nullable = false)

On Sat, Nov 19, 2016 at 7:45 AM, Mendelson, Assaf
<Assaf.Mendelson@rsa.com> wrote:
> In pyspark for example you would do something like:
>
> df.withColumn("newColName",pyspark.sql.functions.lit(None))
>
> Assaf.
> -----Original Message-----
> From: Kristoffer Sjögren [mailto:stoffe@gmail.com]
> Sent: Friday, November 18, 2016 9:19 PM
> To: Mendelson, Assaf
> Cc: user
> Subject: Re: DataFrame select non-existing column
>
> Thanks for your answer. I have been searching the API for doing that but I could not
find how to do it?
>
> Could you give me a code snippet?
>
> On Fri, Nov 18, 2016 at 8:03 PM, Mendelson, Assaf <Assaf.Mendelson@rsa.com> wrote:
>> You can always add the columns to old dataframes giving them null (or some literal)
as a preprocessing.
>>
>> -----Original Message-----
>> From: Kristoffer Sjögren [mailto:stoffe@gmail.com]
>> Sent: Friday, November 18, 2016 4:32 PM
>> To: user
>> Subject: DataFrame select non-existing column
>>
>> Hi
>>
>> We have evolved a DataFrame by adding a few columns but cannot write select statements
on these columns for older data that doesn't have them since they fail with a AnalysisException
with message "No such struct field".
>>
>> We also tried dropping columns but this doesn't work for nested columns.
>>
>> Any non-hacky ways to get around this?
>>
>> Cheers,
>> -Kristoffer
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message