Or look at explode on DataFrame

On Fri, Mar 11, 2016 at 10:45 AM, Stefan Panayotov <spanayotov@msn.com> wrote:
Hi,
 
I have a problem that requires me to go through the rows in a DataFrame (or possibly through rows in a JSON file) and conditionally add rows depending on a value in one of the columns in each existing row. So, for example if I have:
 

+---+---+---+

| _1| _2| _3|
+---+---+---+
|ID1|100|1.1|
|ID2|200|2.2|
|ID3|300|3.3|
|ID4|400|4.4|
+---+---+---+

I need to be able to get:
 

+---+---+---+--------------------+---+

| _1| _2| _3|                  _4| _5|
+---+---+---+--------------------+---+
|ID1|100|1.1|ID1 add text or d...| 25|
|id11 ..|21 |
|id12 ..|22 |
|ID2|200|2.2|ID2 add text or d...| 50|
|id21 ..|33 |
|id22 ..|34 |
|id23 ..|35 |
|ID3|300|3.3|ID3 add text or d...| 75|
|id31 ..|11 |
|ID4|400|4.4|ID4 add text or d...|100|
|id41 ..|51 |
|id42 ..|52 |
|id43 ..|53 |
|id44 ..|54 |
+---+---+---+--------------------+---+

How can I achieve this in Spark without doing DF.collect(), which will get everything to the driver and for a big data set I'll get OOM?
BTW, I know how to use withColumn() to add new columns to the DataFrame. I need to also add new rows.
Any help will be appreciated.

Thanks,