spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sahanbull <>
Subject Using a compression codec in saveAsSequenceFile in Pyspark (Python API)
Date Fri, 14 Nov 2014 04:28:25 GMT

I am trying to save an RDD to an S3 bucket using
RDD.saveAsSequenceFile(self, path, CompressionCodec) function in python. I
need to save the RDD in GZIP. Can anyone tell me how to send the gzip codec
class as a parameter into the function. 

I tried

but it hits me with a : *AttributeError: type object 'GzipFile' has no
attribute '_get_object_id' *
Which I think is because JVM cant find the scala mapping gzip. 

*If you can guide me about any method to write the RDD as a gzip(.gz) into
disc that is very much appreciated. *

Many thanks

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message