spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yana Kadiyska <>
Subject Re: Limit the # of columns in Spark Scala
Date Sun, 14 Dec 2014 21:14:00 GMT
Denny, I am not sure what exception you're observing but I've had luck with
2 things:

val table = sc.textFile("hdfs://....")

You can try calling table.first here and you'll see the first line of the
You can also do val debug = table.first.split("\t") which would give you an
array and you can indeed verify that the array contains what you want in
 positions 167,119 and 200. In the case of large files with a random bad
line I find wrapping the call within the map in try/catch very valuable --
you can dump out the whole line in the catch statement

Lastly I would guess that you're getting a compile error and not a runtime
error -- I believe c is an array of values so I think you want => (c(167), c(110), c(200)) instead of => (c._(167),
c._(110), c._(200))

On Sun, Dec 14, 2014 at 3:12 PM, Denny Lee <> wrote:
> Yes - that works great! Sorry for implying I couldn't. Was just more
> flummoxed that I couldn't make the Scala call work on its own. Will
> continue to debug ;-)
> On Sun, Dec 14, 2014 at 11:39 Michael Armbrust <>
> wrote:
>> BTW, I cannot use SparkSQL / case right now because my table has 200
>>> columns (and I'm on Scala 2.10.3)
>> You can still apply the schema programmatically:

View raw message