spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: adding rows to a DataFrame
Date Fri, 11 Mar 2016 19:49:05 GMT
Or look at explode on DataFrame

On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayotov@msn.com>
wrote:

> Hi,
>
> I have a problem that requires me to go through the rows in a DataFrame
> (or possibly through rows in a JSON file) and conditionally add rows
> depending on a value in one of the columns in each existing row. So, for
> example if I have:
>
>
> +---+---+---+
> | _1| _2| _3|
> +---+---+---+
> |ID1|100|1.1|
> |ID2|200|2.2|
> |ID3|300|3.3|
> |ID4|400|4.4|
> +---+---+---+
>
> I need to be able to get:
>
>
> +---+---+---+--------------------+---+
> | _1| _2| _3|                  _4| _5|
> +---+---+---+--------------------+---+
> |ID1|100|1.1|ID1 add text or d...| 25|
> |id11 ..|21 |
> |id12 ..|22 |
> |ID2|200|2.2|ID2 add text or d...| 50|
> |id21 ..|33 |
> |id22 ..|34 |
> |id23 ..|35 |
> |ID3|300|3.3|ID3 add text or d...| 75|
> |id31 ..|11 |
> |ID4|400|4.4|ID4 add text or d...|100|
> |id41 ..|51 |
> |id42 ..|52 |
> |id43 ..|53 |
> |id44 ..|54 |
> +---+---+---+--------------------+---+
>
> How can I achieve this in Spark without doing DF.collect(), which will get
> everything to the driver and for a big data set I'll get OOM?
> BTW, I know how to use withColumn() to add new columns to the DataFrame. I
> need to also add new rows.
> Any help will be appreciated.
>
> Thanks,
>
>
> *Stefan Panayotov, PhD **Home*: 610-355-0919
> *Cell*: 610-517-5586
> *email*: spanayotov@msn.com
> spanayotov@outlook.com
> spanayotov@comcast.net
>
>

Mime
View raw message