hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: How to process part of a file in Hadoop?
Date Sat, 08 Feb 2014 04:55:06 GMT
You can write a custom InputFormat whose #getSplits(...) returns your
required InputSplit objects (with randomised offsets + lengths, etc.).

On Fri, Feb 7, 2014 at 9:50 PM, Suresh S <sureshhot@gmail.com> wrote:
> Dear Friends,
>
>           I have some very large file in HDFS with 3000+ blocks.
>
> I want run a job with various input size. I want to use the same file as a
> input. Usually the number of task is equal to number of blocks/splits.
> Suppose the job with 2 task need to process randomly any two block of the
> given input file.
>
> How to give a random set of HDFS blocks as a input of a job?
>
> note:  my aim is not processing the input file to produce some output.
> I want to replicate the individual block based on the load.
>
> *Regards*
> *S.Suresh,*
> *Research Scholar,*
> *Department of Computer Applications,*
> *National Institute of Technology,*
> *Tiruchirappalli - 620015.*
> *+91-9941506562*



-- 
Harsh J

Mime
View raw message