I think you are right. FileOutputFormat has a default hard-coded
FileOutputCommitter.
If you want to use DirectoOutputcommitter, check the third-party patched
hadoop package that provides this class on how to set this
DirectoOutputcommitter.
Or you can extends HFileOutputFormat2 and provides a getOutputCommitter()
implementation that returns DirectoOutputcommitter.
Jerry
On Thu, Mar 16, 2017 at 9:29 AM, Fran O <franobeta@gmail.com> wrote:
> Hi folks,
>
> I would like to hear some thoughts on the following use case:
>
> I use a custom MR job to create HFiles . This MR writes the HFiles into S3.
>
> I was trying to change the Outputcommitter in order to have the reducers
> writing directly the HFiles to the final destination on S3. After some
> tests setting the Outputcommitter to be the DirectoOutputcommitter, the
> tasks are always using the FileOutputCommitter.
>
> >> HFileOutputFormat2.configureIncrementalLoad(job, hTable);
> >> FileOutputFormat.setOutputPath(job, outputPath);
> >> FileOutputFormat.setCompressOutput(job, true);
> >> FileOutputFormat.setOutputCompressorClass(job, >>SnappyCodec.class);
>
> Looking at the code of the FileOutputFormat methods
> <https://hadoop.apache.org/docs/stable/api/org/apache/
> hadoop/mapreduce/lib/output/FileOutputFormat.html>
> I see a *getOutputCommitter
> <https://hadoop.apache.org/docs/stable/api/org/apache/
> hadoop/mapreduce/lib/output/FileOutputFormat.html#
> getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext)>
> *method
> but not a set method for the OutputCommitter.
>
> Could someone bring some light on how to change the OutputCommitter for the
> tasks?
>
> Thank you,
> Fran
>
|