spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Limit the # of columns in Spark Scala
Date Sun, 14 Dec 2014 16:56:23 GMT
Hi,

I don't get what the problem is. That map to selected columns looks like
the way to go given the context. What's not working?

Kr, Gerard
On Dec 14, 2014 5:17 PM, "Denny Lee" <denny.g.lee@gmail.com> wrote:

> I have a large of files within HDFS that I would like to do a group by
> statement ala
>
> val table = sc.textFile("hdfs://....")
> val tabs = table.map(_.split("\t"))
>
> I'm trying to do something similar to
> tabs.map(c => (c._(167), c._(110), c._(200))
>
> where I create a new RDD that only has
> but that isn't quite right because I'm not really manipulating sequences.
>
> BTW, I cannot use SparkSQL / case right now because my table has 200
> columns (and I'm on Scala 2.10.3)
>
> Thanks!
> Denny
>
>

Mime
View raw message