spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: How to change the compression format when using SequenceFileOutputFormat with Spark
Date Tue, 20 Oct 2015 18:12:48 GMT
Figured it out.

Fix change the compression codec and set LD_LIBRARY_PATH.



$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apache/hadoop/lib/native/

-sh-4.1$ ./bin/spark-shell


import org.apache.hadoop.io.Text
import org.codehaus.jackson.map.ObjectMapper
import scala.collection.JavaConversions._
import  java.net.URLDecoder
import org.apache.hadoop.conf.Configuration


val dwsite
= sc.sequenceFile("/path/to/sequcefile/part-r-00000",classOf[Text],
classOf[Text])
val records = dwsite.filter {
case (k, v) =>
if(v.toString.indexOf("Bhutan") != -1) true else false
}
val conf = new Configuration
conf
.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec")
records.saveAsNewAPIHadoopFile("dw_output612",classOf[Text],classOf[Text],
classOf[org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat[Text,Text]],
conf )



On Tue, Oct 20, 2015 at 10:52 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> My Code:
>
> val dwsite
> = sc.sequenceFile("/sys/edw/dw_sites/snapshot/2015/10/18/00/part-r-00000",classOf[Text],
classOf[Text])
> val records = dwsite.filter {
> case (k, v) =>
> if(v.toString.indexOf("Bhutan") != -1) true else false
> }
>
> records.saveAsNewAPIHadoopFile("dw_output12",classOf[Text],classOf[Text],
> classOf[org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
> [Text,Text]])
>
> Error:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> 2.0 (TID 4, localhost): java.lang.IllegalArgumentException: SequenceFile
> doesn't work with GzipCodec without native-hadoop code!
>
>
> I cannot install any libraries on this machine or on this cluster as i do
> not have any kind of write access.
>
> I am thinking of using a different compression codec and re-run the same
> program and hope it works. Hence i included
>
>
> sc.getConf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec")
>
>
> but i still get the same error that implies above line did not affect the
> compression codec of sequence file output format.
>
>
> What is the fix ? Any suggestions.
>
> Appreciate your time.
>
>
> --
> Deepak
>
>


-- 
Deepak

Mime
View raw message