spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <danielru...@gmail.com>
Subject Remapping columns from a schemaRDD
Date Tue, 25 Nov 2014 15:02:01 GMT
Hi,
I'm selecting columns from a json file, transform some of them and would
like to store the result as a parquet file but I'm failing.

This is what I'm doing:

val jsonFiles=sqlContext.jsonFile("/requests.loading")
jsonFiles.registerTempTable("jRequests")

val clean_jRequests=sqlContext.sql("select c1, c2, c3 ... c55 from
jRequests")

and then I run a map:
 val
jRequests_flat=clean_jRequests.map(line=>{((line(1),line(2),line(3),line(4),line(5),line(6),line(7),
*line(8).asInstanceOf[Iterable[String]].mkString(",")*,line(9) ,line(10)
,line(11) ,line(12) ,line(13) ,line(14) ,line(15) ,line(16) ,line(17)
,line(18) ,line(19) ,line(20) ,line(21) ,line(22) ,line(23) ,line(24)
,line(25) ,line(26) ,line(27) ,line(28) ,line(29) ,line(30) ,line(31)
,line(32) ,line(33) ,line(34) ,line(35) ,line(36) ,line(37) ,line(38)
,line(39) ,line(40) ,line(41) ,line(42) ,line(43) ,line(44) ,line(45)
,line(46) ,line(47) ,line(48) ,line(49) ,line(50)))})



1. Is there a smarter way to achieve that (only modify a certain column
without relating to the others, but keeping all of them)?
2. The last statement fails because the tuple has too much members:
<console>:19: error: object Tuple50 is not a member of package scala


Thanks for your help,
Daniel

Mime
View raw message