spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alonso Isidoro Roman <alons...@gmail.com>
Subject Re: Beginners Hadoop question
Date Mon, 03 Mar 2014 11:19:30 GMT
Hi, i am a beginner too, but as i have learned, hadoop works better with
big files, at least with 64MB, 128MB or even more. I think you need to
aggregate all the files into a new big one. Then you must copy to HDFS
using this command:

hadoop fs -put MYFILE /YOUR_ROUTE_ON_HDFS/MYFILE

hadoop just copy MYFILE into hadoop distributed file system.

Can i recommend you what i have done? go to BigDataUniversity.com and take
the Hadoop Fundamentals I course. It is free and very well documented.

Regards

Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"



2014-03-03 12:10 GMT+01:00 goi cto <goi.cto@gmail.com>:

> Hi,
>
> I am sorry for the beginners question but...
> I have a spark java code which reads a file (c:\my-input.csv) process it
> and writes an output file (my-output.csv)
> Now I want to run it on Hadoop in a distributed environment
> 1) My inlut file should be one big file or separate smaller files?
> 2) if we are using smaller files, how does my code needs to change to
> process all of the input files?
>
> Will Hadoop just copy the files to different servers or will it also split
> their content among servers?
>
> Any example will be great!
> --
> Eran | CTO
>

Mime
View raw message