spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <luohui20...@sina.com>
Subject Is there an operation to create multi record for every element in a RDD?
Date Wed, 09 Aug 2017 08:29:17 GMT
hello guys:      I have a simple rdd like :val userIDs = 1 to 10000val rdd1 = sc.parallelize(userIDs
, 16)   //this rdd has 10000 user id      And I have a List[String] like below:scala> listForRule77
res76: List[String] = List(1,1,100.00|1483286400, 1,1,100.00|1483372800, 1,1,100.00|1483459200,
1,1,100.00|1483545600, 1,1,100.00|1483632000, 1,1,100.00|1483718400, 1,1,100.00|1483804800,
1,1,100.00|1483891200, 1,1,100.00|1483977600, 3,1,200.00|1485878400, 1,1,100.00|1485964800,
1,1,100.00|1486051200, 1,1,100.00|1488384000, 1,1,100.00|1488470400, 1,1,100.00|1488556800,
1,1,100.00|1488643200, 1,1,100.00|1488729600, 1,1,100.00|1488816000, 1,1,100.00|1488902400,
1,1,100.00|1488988800, 1,1,100.00|1489075200, 1,1,100.00|1489161600, 1,1,100.00|1489248000,
1,1,100.00|1489334400, 1,1,100.00|1489420800, 1,1,100.00|1489507200, 1,1,100.00|1489593600,
1,1,100.00|1489680000, 1,1,100.00|1489766400)
scala> listForRule77.length
res77: Int = 29
      I need to create a rdd containing  290000 records. for every userid in rdd1 , I need
to create 29 records according to listForRule77, each record start with the userid, for example
1(the userid),1,1,100.00|1483286400.       My idea is like below:1.write a udfto add the userid
to the beginning of every string element of listForRule77.2.use val rdd2 = rdd1.map{x=>
List_udf(x))}.flatmap(), the result rdd2 maybe what I need.
      My question: Are there any problems in my idea? Is there a better way to do this ? 
           

--------------------------------

 

Thanks&amp;Best regards!
San.Luo
Mime
View raw message