spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denny Lee <denny.g....@gmail.com>
Subject Re: Limit the # of columns in Spark Scala
Date Mon, 15 Dec 2014 05:07:17 GMT
Oh, just figured it out:

tabs.map(c => Array(c(167), c(110), c(200))

Thanks for all of the advice, eh?!





On Sun Dec 14 2014 at 1:14:00 PM Yana Kadiyska <yana.kadiyska@gmail.com>
wrote:

> Denny, I am not sure what exception you're observing but I've had luck
> with 2 things:
>
> val table = sc.textFile("hdfs://....")
>
> You can try calling table.first here and you'll see the first line of the
> file.
> You can also do val debug = table.first.split("\t") which would give you
> an array and you can indeed verify that the array contains what you want in
>  positions 167,119 and 200. In the case of large files with a random bad
> line I find wrapping the call within the map in try/catch very valuable --
> you can dump out the whole line in the catch statement
>
> Lastly I would guess that you're getting a compile error and not a runtime
> error -- I believe c is an array of values so I think you want
> tabs.map(c => (c(167), c(110), c(200)) instead of tabs.map(c => (c._(167),
> c._(110), c._(200))
>
>
>
> On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee <denny.g.lee@gmail.com> wrote:
>>
>> Yes - that works great! Sorry for implying I couldn't. Was just more
>> flummoxed that I couldn't make the Scala call work on its own. Will
>> continue to debug ;-)
>>
>> On Sun, Dec 14, 2014 at 11:39 Michael Armbrust <michael@databricks.com>
>> wrote:
>>
>>> BTW, I cannot use SparkSQL / case right now because my table has 200
>>>> columns (and I'm on Scala 2.10.3)
>>>>
>>>
>>> You can still apply the schema programmatically:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
>>>
>>

Mime
View raw message