spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laeeq Ahmed <>
Subject Processing multiple columns in parallel
Date Mon, 18 May 2015 13:37:56 GMT
Consider I have a tab delimited text file with 10 columns. Each column is a a set of text.
I would like to do a word count for each column. In scala, I would do the following RDD transformation
and action: 

val data = sc.textFile("hdfs://namenode/data.txt") 
for(i <- 0 until 9){"\t",-1)(i)).map((_,1)).reduce(_+_).saveAsTextFile("i") 

Within the for loop, it's a parallel process, but each column is sequentially processed from
0 to 9. 

Is there anyway so that I can process multiple column in parallel in Spark? I saw posting
about using AKKA, but RDD itself is already using AKKA. Any pointers would be appreciated.

View raw message