spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Narang <>
Subject Re: reading a csv dynamically
Date Thu, 22 Jan 2015 02:46:12 GMT
Yes I think you need to create one map first which will keep the number of
values in every line. Now you can group all the records with same number of
values. Now you know how many types of arrays you will have.

val dataRDD = sc.textFile("file.csv") 
val dataLengthRDD =   dataRDD .map(line=>(_.split(",").length,line))
val groupedData = dataLengthRDD.groupByKey()

now you can process the groupedData as it will have arrays of length x in
one RDD.

groupByKey([numTasks])	When called on a dataset of (K, V) pairs, returns a
dataset of (K, Iterable<V>) pairs. 

I hope this helps

Infoshore Software

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message