spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Lewis <>
Subject 'nested' RDD problem, advise needed
Date Sat, 21 Mar 2015 17:26:56 GMT

I wonder if someone can help suggest a solution to my problem, I had a simple process working
using Strings and now
want to convert to RDD[Char], the problem is when I end up with a nested call as follow:

1) Load a text file into an RDD[Char]

	val inputRDD = sc.textFile(“myFile.txt”).flatMap(_.toIterator)

2) I have a method that takes two parameters:

	object Foo
		def myFunction(inputRDD: RDD[Char], int val) : RDD[Char] ...

3) I have a method that the driver process calls once its loaded the inputRDD ‘bar’ as

def bar(inputRDD: Rdd[Char) : Int = {

	 val solutionSet = sc.parallelize(1 to alphabetLength toList).map(shift => (shift, Object.myFunction(inputRDD,shift)))

What I’m trying to do is take a list 1..26 and generate a set of tuples { (1,RDD(1)), ….
(26,RDD(26)) }  which is the inputRDD passed through
the function above, but with different set of shift parameters.

In my original I could parallelise the algorithm fine, but my input string had to be in a
‘String’ variable, I’d rather it be an RDD 
(string could be large). I think the way I’m trying to do it above won’t work because
its a nested RDD call. 

Can anybody suggest a solution?

Mike Lewis

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message