spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Add column value in the dataset on the basis of a condition
Date Tue, 18 Dec 2018 14:55:08 GMT
Have you tried using withColumn? You can add a boolean column based on
whether the age exists or not and then drop the older age column. You
wouldn't need union of dataframes then

On Tue, Dec 18, 2018 at 8:58 AM Devender Yadav <devender.yadav@impetus.co.in>
wrote:

> Hi All,
>
>
> useful code:
>
> public class EmployeeBean implements Serializable {
>
>     private Long id;
>
>     private String name;
>
>     private Long salary;
>
>     private Integer age;
>
>     // getters and setters
>
> }
>
>
> Relevant spark code:
>
> SparkSession spark =
> SparkSession.builder().master("local[2]").appName("play-with-spark").getOrCreate();
> List<EmployeeBean> employees1 = populateEmployees(1, 10);
>
> Dataset<EmployeeBean> ds1 = spark.createDataset(employees1,
> Encoders.bean(EmployeeBean.class));
> ds1.show();
> ds1.printSchema();
>
> Dataset<Row> ds2 = ds1.where("age is null").withColumn("is_age_null",
> lit(true));
> Dataset<Row> ds3 = ds1.where("age is not null").withColumn("is_age_null",
> lit(false));
>
> Dataset<Row> ds4 = ds2.union(ds3);
> ds4.show();
>
>
> Relevant Output:
>
>
> ds1
>
> +----+---+----+------+
> | age| id|name|salary|
> +----+---+----+------+
> |null|  1|dev1| 11000|
> |   2|  2|dev2| 12000|
> |null|  3|dev3| 13000|
> |   4|  4|dev4| 14000|
> |null|  5|dev5| 15000|
> +----+---+----+------+
>
>
> ds4
>
> +----+---+----+------+-----------+
> | age| id|name|salary|is_age_null|
> +----+---+----+------+-----------+
> |null|  1|dev1| 11000|       true|
> |null|  3|dev3| 13000|       true|
> |null|  5|dev5| 15000|       true|
> |   2|  2|dev2| 12000|      false|
> |   4|  4|dev4| 14000|      false|
> +----+---+----+------+-----------+
>
>
> Is there any better solution to add this column in the dataset rather than
> creating two datasets and performing union?
>
> <
> https://stackoverflow.com/questions/53834286/add-column-value-in-spark-dataset-on-the-basis-of-the-condition
> >
>
>
>
> Regards,
> Devender
>
> ________________________________
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Mime
View raw message