spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <colin.williams.seat...@gmail.com>
Subject Re: Casting nested columns and updated nested struct fields.
Date Fri, 23 Nov 2018 16:42:43 GMT
Seems like it's worthy of filing a bug against withColumn

On Wed, Nov 21, 2018, 6:25 PM Colin Williams <
colin.williams.seattle@gmail.com wrote:

> Hello,
>
> I'm currently trying to update the schema for a dataframe with nested
> columns. I would either like to update the schema itself or cast the
> column without having to explicitly select all the columns just to
> cast one.
>
> In regards to updating the schema it looks like I would probably need
> to write a more complex map on the schema to find the StructFields I
> want  to update and update them. I haven't found any examples of this
> but it seems like there should be a simpler way to do it.
>
> In regards to changing the column on the dataframe itself, using E.G.
>
> val newDF =
> df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>
> I end up with a new column named "existing.top.level.FIELD_NAME" at
> the root level vs updating the nested column to the new type. Then has
> anybody worked out how to both update nested column datatype and also
> how to update the column type from the nested schema StructType? Are
> there any easy ways to do this or is there a reason it is not trivial?
>

Mime
View raw message