spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Zhang <owenzhang1...@gmail.com>
Subject Re: Split a row into multiple rows Java
Date Wed, 08 Aug 2018 06:16:30 GMT
The following may help although in Scala. The idea is to firstly concat
each value with time, assembly all time_value into an array and explode,
and finally split time_value into time and value.

 val ndf = df.select(col("name"), col("otherName"),
    explode(
      array(concat_ws(":", col("v1"), lit("v1")).alias("v1"),
        concat_ws(":", col("v2"), lit("v2")).alias("v2"),
        concat_ws(":", col("v3"), lit("v3")).alias("v3"))
    ).alias("temp"))

  val fields = split(col("temp"), ":")
  ndf.select(col("name"), col("otherName"),
    fields.getItem(1).alias("time"),
    fields.getItem(0).alias("value"))

Regards,
Manu Zhang

On Wed, Aug 8, 2018 at 11:41 AM nookala <srinookala@gmail.com> wrote:

> +-----+---------+----+----+----+
> | name|otherName|val1|val2|val3|
> +-----+---------+----+----+----+
> |  bob|       b1|   1|   2|   3|
> |alive|       c1|   3|   4|   6|
> |  eve|       e1|   7|   8|   9|
> +-----+---------+----+----+----+
>
> I need this to become
>
> +-----+---------+----+---------
> | name|otherName|time|value
> +-----+---------+----+---------
> |  bob|       b1|   val1|    1
> |  bob|       b1|   val2|    2
> |  bob|       b1|   val3|    3
> |alive|       c1|   val1|     3
> |alive|       c1|   val2|     4
> |alive|       c1|   val3|     6
> |  eve|       e1|   val1|    7
> |  eve|       e1|   val2|    8
> |  eve|       e1|   val3|    9
> +-----+---------+----+-----
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message