spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alonso Isidoro Roman <>
Subject Re: Beginners Hadoop question
Date Mon, 03 Mar 2014 11:19:30 GMT
Hi, i am a beginner too, but as i have learned, hadoop works better with
big files, at least with 64MB, 128MB or even more. I think you need to
aggregate all the files into a new big one. Then you must copy to HDFS
using this command:


hadoop just copy MYFILE into hadoop distributed file system.

Can i recommend you what i have done? go to and take
the Hadoop Fundamentals I course. It is free and very well documented.


Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"

2014-03-03 12:10 GMT+01:00 goi cto <>:

> Hi,
> I am sorry for the beginners question but...
> I have a spark java code which reads a file (c:\my-input.csv) process it
> and writes an output file (my-output.csv)
> Now I want to run it on Hadoop in a distributed environment
> 1) My inlut file should be one big file or separate smaller files?
> 2) if we are using smaller files, how does my code needs to change to
> process all of the input files?
> Will Hadoop just copy the files to different servers or will it also split
> their content among servers?
> Any example will be great!
> --
> Eran | CTO

View raw message