spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Narang <pankajnaran...@gmail.com>
Subject Re: reading a csv dynamically
Date Thu, 22 Jan 2015 02:46:12 GMT
Yes I think you need to create one map first which will keep the number of
values in every line. Now you can group all the records with same number of
values. Now you know how many types of arrays you will have.


val dataRDD = sc.textFile("file.csv") 
val dataLengthRDD =   dataRDD .map(line=>(_.split(",").length,line))
val groupedData = dataLengthRDD.groupByKey()

now you can process the groupedData as it will have arrays of length x in
one RDD.

groupByKey([numTasks])	When called on a dataset of (K, V) pairs, returns a
dataset of (K, Iterable<V>) pairs. 


I hope this helps

Regards
Pankaj 
Infoshore Software
India




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/reading-a-csv-dynamically-tp21304p21307.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message