spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shahab <>
Subject How to assign consecutive numeric id to each row based on its content?
Date Tue, 18 Nov 2014 12:54:34 GMT

In my spark application, I am loading some rows from database into Spark
Each row has several fields, and a string key. Due to my requirements I
need to work with consecutive numeric ids (starting from 1 to N, where N is
the number of unique keys) instead of string keys . Also several rows can
have same string key .

In spark context, how I can map each row into (Numeric_Key, OriginalRow) as
map/reduce  tasks such that rows with same original string key get same
numeric consecutive key?

Any hints?


View raw message