spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From trottdw <>
Subject RDD Manipulation in Scala.
Date Tue, 04 Mar 2014 13:06:12 GMT
Hello, I am using Spark with Scala and I am attempting to understand the
different filtering and mapping capabilities available.  I haven't found an
example of the specific task I would like to do.

I am trying to read in a tab spaced text file and filter specific entries. 
I would like this filter to be applied to different "columns" and not lines.  
I was using the following to split the data but attempts to filter by
"column" afterwards are not working.
   val data = sc.textFile("test_data.txt")
   var parsedData = _.split("\t").map(_.toString))

To try to give a more concrete example of my goal,
Suppose the data file is:
A1    A2     A3     A4
B1    B2     A3     A4
C1    A2     C2     C3

How would I filter the data based on the second column to only return those
entries which have A2 in column two?  So, that the resulting RDD would just

A1    A2     A3     A4
C1    A2     C2     C3

Is there a convenient way to do this?  Any suggestions or assistance would
be appreciated.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message