spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From goi cto <goi....@gmail.com>
Subject Re: Beginners Hadoop question
Date Mon, 03 Mar 2014 11:28:28 GMT
Thanks. I will try it!


On Mon, Mar 3, 2014 at 1:19 PM, Alonso Isidoro Roman <alonsoir@gmail.com>wrote:

> Hi, i am a beginner too, but as i have learned, hadoop works better with
> big files, at least with 64MB, 128MB or even more. I think you need to
> aggregate all the files into a new big one. Then you must copy to HDFS
> using this command:
>
> hadoop fs -put MYFILE /YOUR_ROUTE_ON_HDFS/MYFILE
>
> hadoop just copy MYFILE into hadoop distributed file system.
>
> Can i recommend you what i have done? go to BigDataUniversity.com and take
> the Hadoop Fundamentals I course. It is free and very well documented.
>
> Regards
>
> Alonso Isidoro Roman.
>
> Mis citas preferidas (de hoy) :
> "Si depurar es el proceso de quitar los errores de software, entonces
> programar debe ser el proceso de introducirlos..."
>  -  Edsger Dijkstra
>
> My favorite quotes (today):
> "If debugging is the process of removing software bugs, then programming
> must be the process of putting ..."
>   - Edsger Dijkstra
>
> "If you pay peanuts you get monkeys"
>
>
>
> 2014-03-03 12:10 GMT+01:00 goi cto <goi.cto@gmail.com>:
>
> Hi,
>>
>> I am sorry for the beginners question but...
>> I have a spark java code which reads a file (c:\my-input.csv) process it
>> and writes an output file (my-output.csv)
>> Now I want to run it on Hadoop in a distributed environment
>> 1) My inlut file should be one big file or separate smaller files?
>> 2) if we are using smaller files, how does my code needs to change to
>> process all of the input files?
>>
>> Will Hadoop just copy the files to different servers or will it also
>> split their content among servers?
>>
>> Any example will be great!
>> --
>> Eran | CTO
>>
>
>


-- 
Eran | CTO

Mime
View raw message