spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanagha Kumar <kpra...@salesforce.com>
Subject Re: Replicating a row n times
Date Fri, 29 Sep 2017 07:21:31 GMT
Thanks for the response.
I can use either row_number() or monotonicallyIncreasingId to generate
uniqueIds as in
https://hadoopist.wordpress.com/2016/05/24/generate-unique-ids-for-each-rows-in-a-spark-dataframe/

I'm looking for a java example to use that to replicate a single row n
times by appending a rownum column generated as above or using explode
function.

Ex:

ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx));

columnEx needs to be of type array inorder for explode to work.

Any suggestions are helpful.
Thanks


On Thu, Sep 28, 2017 at 7:21 PM, ayan guha <guha.ayan@gmail.com> wrote:

> How about using row number for primary key?
>
> Select row_number() over (), * from table
>
> On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar <kprasad@salesforce.com>
> wrote:
>
>> Hi,
>>
>> I'm trying to replicate a single row from a dataset n times and create a
>> new dataset from it. But, while replicating I need a column's value to be
>> changed for each replication since it would be end up as the primary key
>> when stored finally.
>>
>> Looked at the following reference:https://stackoverflow.com/questions/
>> 40397740/replicate-spark-row-n-times
>>
>> import org.apache.spark.sql.functions._
>> val result = singleRowDF
>>   .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
>>   .selectExpr(singleRowDF.columns: _*)
>>
>> How can I create a column from an array of values in Java and pass it to
>> explode function? Suggestions are helpful.
>>
>>
>> Thanks
>> Kanagha
>>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message