spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bowden, Chris" <chris.bow...@microfocus.com>
Subject Re: assign one identifier for all rows that have similar value in RDD
Date Fri, 20 Apr 2018 15:56:13 GMT
Just hash the column value


-Chris

________________________________
From: Vadim Semenov <vadim@datadoghq.com>
Sent: Friday, April 20, 2018 7:09:51 AM
To: Donni Khan
Cc: user
Subject: Re: assign one identifier for all rows that have similar value in RDD

Create another rdd with one-to-one relations Col -> Id, and then join on it?

On Fri, Apr 20, 2018 at 7:19 AM, Donni Khan <prince.donnii@googlemail.com<mailto:prince.donnii@googlemail.com>>
wrote:
Hi Spark Users,

I want to add one Identifier for all rows that have similar values in an specific column.
actually I have Spark RDD containing "ID_Nr" and  "Col" ,  I want to assign one identifier
for all rows that have similar value in "Col"

[cid:ii_jg7uux2m0_162e2c14b6c73f5c]

Doeas anyone knows any idea (codes ) to do that?

Thank you,




--
Sent from my iPhone

Mime
View raw message