spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <>
Subject How to change the compression format when using SequenceFileOutputFormat with Spark
Date Tue, 20 Oct 2015 17:52:42 GMT
My Code:

val dwsite
= sc.sequenceFile("/sys/edw/dw_sites/snapshot/2015/10/18/00/part-r-00000",classOf[Text],
val records = dwsite.filter {
case (k, v) =>
if(v.toString.indexOf("Bhutan") != -1) true else false



org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage
2.0 (TID 4, localhost): java.lang.IllegalArgumentException: SequenceFile
doesn't work with GzipCodec without native-hadoop code!

I cannot install any libraries on this machine or on this cluster as i do
not have any kind of write access.

I am thinking of using a different compression codec and re-run the same
program and hope it works. Hence i included


but i still get the same error that implies above line did not affect the
compression codec of sequence file output format.

What is the fix ? Any suggestions.

Appreciate your time.


View raw message