spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Puzanov <antonpuzdeve...@gmail.com>
Subject Re: Split a row into multiple rows Java
Date Wed, 01 Aug 2018 20:41:07 GMT
you can always use array+explode, I don't know if its the most
elegant/optimal solution (would be happy to hear from the experts)

code example:
//create data

Dataset<Row> test= spark.createDataFrame(Arrays.asList(new
InternalData("bob", "b1", 1,2,3),
        new InternalData("alive", "c1", 3,4,6),
        new InternalData("eve", "e1", 7,8,9)
        ), InternalData.class);

+-----+---------+----+----+----+
| name|otherName|val1|val2|val3|
+-----+---------+----+----+----+
|  bob|       b1|   1|   2|   3|
|alive|       c1|   3|   4|   6|
|  eve|       e1|   7|   8|   9|
+-----+---------+----+----+----+

Dataset<Row> expandedTest = test.selectExpr("name", "otherName",
"explode(array(val1, val2, val3)) as time");
expandedTest.show();
+-----+---------+----+
| name|otherName|time|
+-----+---------+----+
|  bob|       b1|   1|
|  bob|       b1|   2|
|  bob|       b1|   3|
|alive|       c1|   3|
|alive|       c1|   4|
|alive|       c1|   6|
|  eve|       e1|   7|
|  eve|       e1|   8|
|  eve|       e1|   9|
+-----+---------+----+


On Wed, Aug 1, 2018 at 11:05 PM, nookala <srinookala@gmail.com> wrote:

> Pivot seems to do the opposite of what I want, convert rows to columns.
>
> I was able to get this done in python, but would like to do this in Java
>
> idfNew = idf.rdd.flatMap((lambda row: [(row.Name, row.Id, row.Date,
> "0100",row.0100),(row.Name, row.Id, row.Date, "0200",row.0200),row.Name,
> row.Id, row.Date, "0300",row.0300),row.Name, row.Id, row.Date,
> "0400",row.0400)])).toDF()
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message