spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: Lazyoutput format in spark
Date Sun, 02 Mar 2014 22:56:19 GMT
You can probably use LazyOutputFormat directly. If there’s one for the hadoop.mapred API,
you can use it with PairRDDFunctions.saveAsHadoopRDD() today, otherwise there’s going to
be a version of that for the hadoop.mapreduce API as well in Spark 1.0.


On Feb 28, 2014, at 5:18 PM, Mohit Singh <> wrote:

> Hi,
>   Is there something equivalent of LazyOutputFormat equivalent in spark (pyspark)
> Basically, something like where I only save files which has some data in it rather than
saving all the files as in some cases, your majority of files can be empty?
> Thanks
> -- 
> Mohit
> "When you want success as badly as you want the air, then you will get it. There is no
other secret of success."
> -Socrates

View raw message