spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <colin.williams.seat...@gmail.com>
Subject Casting nested columns and updated nested struct fields.
Date Thu, 22 Nov 2018 02:25:49 GMT
Hello,

I'm currently trying to update the schema for a dataframe with nested
columns. I would either like to update the schema itself or cast the
column without having to explicitly select all the columns just to
cast one.

In regards to updating the schema it looks like I would probably need
to write a more complex map on the schema to find the StructFields I
want  to update and update them. I haven't found any examples of this
but it seems like there should be a simpler way to do it.

In regards to changing the column on the dataframe itself, using E.G.

val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))

I end up with a new column named "existing.top.level.FIELD_NAME" at
the root level vs updating the nested column to the new type. Then has
anybody worked out how to both update nested column datatype and also
how to update the column type from the nested schema StructType? Are
there any easy ways to do this or is there a reason it is not trivial?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message