Hi Folks!I'm running a spark JOB on a cluster with 9 slaves and 1 master (250GB RAM, 32 cores each and 1TB of storage each).
This job generates 1.200 TB of data on a RDD with 1200 partitions.When I call saveAsTextFile("hdfs://..."), spark creates 1200 files named "part-000*" on HDFS's folder. However, just a few files have content (~450 files has 2.3GB) and all others with no content (0 bytes).Is there any explanation for this file size (2.3GB)?
Shouldn't spark saves 1200 files with 1GB each?Thanks in advance.Regards,
Alan Vidotti Prando.