spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jan.zi...@centrum.cz>
Subject How to not write empty RDD partitions in RDD.saveAsTextFile()
Date Sat, 18 Oct 2014 12:30:30 GMT
Hi,

I am developing program using Spark where I am using filter such as:
 
cleanedData = distData.map(json_extractor.extract_json).filter(lambda x: x != None and x !=
'')
cleanedData.saveAsTextFile(sys.argv[3])
 
 
It happens to me that there is saved lot of empty files (probably from those partitions that
should have been filtered out). Is there some way, how to prevent Spark from saving these
empty files?
 
Thank you in advance for any help.
 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message