spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <>
Subject Re: Dataframe Transformation with Inner fields in Complex Datatypes.
Date Mon, 18 Jul 2016 00:00:45 GMT

withColumn adds the column. If you want different name, please use .alias()

On Mon, Jul 18, 2016 at 2:16 AM, java bigdata <> wrote:

> Hi Team,
> I am facing a major issue while transforming dataframe containing complex
> datatype columns. I need to update the inner fields of complex datatype,
> for eg: converting one inner field to UPPERCASE letters, and return the
> same dataframe with new upper case values in it. Below is my issue
> description. Kindly suggest/guide me a way forward.
> *My suggestion: *can we have a new version of *dataframe.withcolumn(<innerfieldreference>,
> udf($innerfieldreference), <reference or colname indicator argument>)*,
> so that when this method gets executed, i get same dataframe with
> transformed values.
> *Issue Description:*
> Using dataframe.withColumn(<colname>,udf($colname)) for inner fields in
> struct/complex datatype, results in a new dataframe with the a new column
> appended to it. "colname" in the above argument is given as fullname with
> dot notation to access the struct/complex fields.
> For eg: hive table has columns: (id int, address struct<line1: struct<
> buildname:string, stname:string>>, line2:string>)
> I need to update the inner field 'buildname'. I can select the inner field
> through dataframe as :$"address.line1.buildname"), however when
> I use df.withColumn("address.line1.buildname",
> toUpperCaseUDF($"address.line1.buildname")), it is resulting in a new
> dataframe with new column: "address.line1.buildname" appended, with
> toUpperCaseUDF values from inner field buildname.
> How can I update the inner fields of the complex data types. Kindly
> suggest.
> Thanks in anticipation.
> Best Regards,
> Naveen Kumar.

Best Regards,
Ayan Guha

View raw message