spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Kumar <>
Subject Re: Splitting columns from a text file
Date Mon, 05 Sep 2016 15:21:38 GMT
Thanks everyone.
I am not skilled like you gentlemen
This is what I did
1) Read the text file
val textFile = sc.textFile("/tmp/myfile.txt")

2) That produces an RDD of String.
3) Create a DF after splitting the file into an Array 
val df = => line.split(",")).map(x=>(x(0).toInt,x(1).toString,x(2).toDouble)).toDF
4) Create a class for column headers
 case class Columns(col1: Int, col2: String, col3: Double)
5) Assign the column headers 
val h = => Columns(p(0).toString.toInt, p(1).toString, p(2).toString.toDouble))
6) Only interested in column 3 > 50
 h.filter(col("Col3") > 50.0)
7) Now I just want Col3 only
h.filter(col("Col3") > 50.0).select("col3").show(5)+-----------------+|          
showing top 5 rows
Does that make sense. Are there shorter ways gurus? Can I just do all this on RDD without
Thanking you


    On Monday, 5 September 2016, 15:19, ayan guha <> wrote:

 Then, You need to refer third term in the array, convert it to your desired data type and
then use filter. 

On Tue, Sep 6, 2016 at 12:14 AM, Ashok Kumar <> wrote:

Hi,I want to filter them for values.
This is what is in array
74,20160905-133143,98. 11218069128827594148

I want to filter anything > 50.0 in the third column


    On Monday, 5 September 2016, 15:07, ayan guha <> wrote:

x.split returns an array. So, after first map, you will get RDD of arrays. What is your expected
outcome of 2nd map? 
On Mon, Sep 5, 2016 at 11:30 PM, Ashok Kumar <> wrote:

Thank you sir.
This is what I get
scala>> x.split(","))res52: org.apache.spark.rdd.RDD[ Array[String]]
= MapPartitionsRDD[27] at map at <console>:27
How can I work on individual columns. I understand they are strings
scala>> x.split(",")).map(x => (x.getString(0))     | )<console>:27:
error: value getString is not a member of Array[String]> x.split(",")).map(x
=> (x.getString(0))


    On Monday, 5 September 2016, 13:51, Somasundaram Sekar <somasundar.sekar@>

 Basic error, you get back an RDD on transformations like"filename").map(x
=> x.split(",") 
On 5 Sep 2016 6:19 pm, "Ashok Kumar" <> wrote:

I have a text file as below that I read in
74,20160905-133143,98. 1121806912882759414875,20160905-133143,49. 5277699881591680774276,20160905-133143,56.
0802995712398098455677,20160905-133143,46. 6368952654440752277778,20160905-133143,84. 8822714116440218155179,20160905-133143,68.
val textFile = sc.textFile("/tmp/mytextfile. txt")
Now I want to split the rows separated by ","
scala>>x.toString). split(",")<console>:27: error: value split
is not a member of org.apache.spark.rdd.RDD[ String]>x.toString).
However, the above throws error?
Any ideas what is wrong or how I can do this if I can avoid converting it to String?


Best Regards,
Ayan Guha


Best Regards,
Ayan Guha

View raw message