spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Coluccio <roberto.coluc...@gmail.com>
Subject Spark 1.3.1 + Hive: write output to CSV with header on S3
Date Fri, 17 Jul 2015 09:29:08 GMT
Hello community,

I'm currently using Spark 1.3.1 with Hive support for outputting processed
data on an external Hive table backed on S3. I'm using a manual
specification of the delimiter, but I'd want to know if is there any
"clean" way to write in CSV format:

*val* sparkConf = *new* SparkConf()

*val* sc = *new* SparkContext(sparkConf)

*val* hiveContext = *new* org.apache.spark.sql.hive.HiveContext(sc)

*import* hiveContext.implicits._

hiveContext.sql( "CREATE EXTERNAL TABLE IF NOT EXISTS table_name(field1
STRING, field2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '" + path_on_s3 + "'")

hiveContext.sql(<an INSERT OVERWRITE query to write into the above table>)


I also need the header of the table to be printed on each written file. I
tried with:


hiveContext.sql("set hive.cli.print.header=true")


But it didn't work.


Any hint?


Thank you.


Best regards,

Roberto

Mime
View raw message