spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject How to change the compression format when using SequenceFileOutputFormat with Spark
Date Tue, 20 Oct 2015 17:52:42 GMT
My Code:

val dwsite
= sc.sequenceFile("/sys/edw/dw_sites/snapshot/2015/10/18/00/part-r-00000",classOf[Text],
classOf[Text])
val records = dwsite.filter {
case (k, v) =>
if(v.toString.indexOf("Bhutan") != -1) true else false
}

records.saveAsNewAPIHadoopFile("dw_output12",classOf[Text],classOf[Text],
classOf[org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
[Text,Text]])

Error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage
2.0 (TID 4, localhost): java.lang.IllegalArgumentException: SequenceFile
doesn't work with GzipCodec without native-hadoop code!


I cannot install any libraries on this machine or on this cluster as i do
not have any kind of write access.

I am thinking of using a different compression codec and re-run the same
program and hope it works. Hence i included

sc.getConf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec")


but i still get the same error that implies above line did not affect the
compression codec of sequence file output format.


What is the fix ? Any suggestions.

Appreciate your time.


-- 
Deepak

Mime
View raw message