spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goi cto <>
Subject Beginners Hadoop question
Date Mon, 03 Mar 2014 11:10:53 GMT

I am sorry for the beginners question but...
I have a spark java code which reads a file (c:\my-input.csv) process it
and writes an output file (my-output.csv)
Now I want to run it on Hadoop in a distributed environment
1) My inlut file should be one big file or separate smaller files?
2) if we are using smaller files, how does my code needs to change to
process all of the input files?

Will Hadoop just copy the files to different servers or will it also split
their content among servers?

Any example will be great!
Eran | CTO

View raw message