spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Limit the # of columns in Spark Scala
Date Sun, 14 Dec 2014 16:15:42 GMT
I have a large of files within HDFS that I would like to do a group by
statement ala

val table = sc.textFile("hdfs://....")
val tabs = table.map(_.split("\t"))

I'm trying to do something similar to
tabs.map(c => (c._(167), c._(110), c._(200))

where I create a new RDD that only has
but that isn't quite right because I'm not really manipulating sequences.

BTW, I cannot use SparkSQL / case right now because my table has 200
columns (and I'm on Scala 2.10.3)

Thanks!
Denny

Mime
View raw message