spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Re: Limit the # of columns in Spark Scala
Date Sun, 14 Dec 2014 17:01:03 GMT
Getting a bunch of syntax errors. Let me get back with the full statement
and error later today. Thanks for verifying my thinking wasn't out in left
field.
On Sun, Dec 14, 2014 at 08:56 Gerard Maas <gerard.maas@gmail.com> wrote:

> Hi,
>
> I don't get what the problem is. That map to selected columns looks like
> the way to go given the context. What's not working?
>
> Kr, Gerard
> On Dec 14, 2014 5:17 PM, "Denny Lee" <denny.g.lee@gmail.com> wrote:
>
>> I have a large of files within HDFS that I would like to do a group by
>> statement ala
>>
>> val table = sc.textFile("hdfs://....")
>> val tabs = table.map(_.split("\t"))
>>
>> I'm trying to do something similar to
>> tabs.map(c => (c._(167), c._(110), c._(200))
>>
>> where I create a new RDD that only has
>> but that isn't quite right because I'm not really manipulating sequences.
>>
>> BTW, I cannot use SparkSQL / case right now because my table has 200
>> columns (and I'm on Scala 2.10.3)
>>
>> Thanks!
>> Denny
>>
>>

Mime
View raw message