hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darpan R <darpa...@gmail.com>
Subject Confusion related to NLineInputFormat
Date Mon, 06 May 2013 12:23:29 GMT
Hi guys,
 I've a confusion related to NLineInputFormat.

I have written MR job using NLineInputFormat ,output I am getting fine. But
I am getting only 2 Map jobs running.

According to documentation of NLineInputFormat :
If you want your mappers to receive a fixed number of lines of input, then
NLineInputFormat is the InputFormat to use. N refers to the number of lines
of input that each mapper receives.

I've couple of files. each around 1Mb. I've kept on HDFS.
I've written the MR job and in the driver I am setting
mapreduce.input.lineinputformat.linespermap to 20.
(Means I want 20 lines to be processed by each map. )
I've also tried setting this value by calling
 NLineInputFormat.setNumLinesPerSplit(job, 20);

Both of my input files have exactly 1000 lines each , so total 20000 lines,
so according to this 2000/20 = 100 map tasks should have been created. But
when I refer to the counters I see only 2 map taks have run. I am not sure
if I've done something wrong.
Can anyone help me understand this better ?

Thanks in advance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message