Hello, 

I have started experimenting with Spark Cluster. I have a parallelization job where I want to parse through several folders and each of them has multiple files,which I parse and do some file processing on the files' records and write the whole file back to a output file. I do the same processing operation(Hashing certain fields in the data file) for all the files inside the Folder. Simply, 

For a directory D, 
  Read all files inside D. 
    For each File F
      Loop: For each line L in File, I do some processing and write my processing output to a file. 

So if there are 200 files inside input directory - I would like to have 200 files in my output directory. I learnt that with SaveAsTextFile(Name) API spark creates a directory with the name we specify (Name) and creates the actual output files inside that folder in the form of part-00000,part-00001 etc.. files ( similar to Hadoop, I assumed). 
My question is there a way where we specify the name of the output directory and redirect all my SaveAsTextFile(DirName) outputs into a single folder rather ?

Let me know if there is a way of achieving this. If not, I would appreciate hearing some workarounds. Thanks!


Regards,

Ramkumar Chokkalingam, 
Masters Student, University of Washington || 206-747-3515