hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Wed, 18 May 2016 17:16:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289361#comment-15289361
] 

Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------

bq. Not sure if it's good to have something like isXOR or isRS, because we'll have more coder
algorithms other than the current both.

That's a fair point.  It seems unlikely that we need an isXOR or isRS method.

bq. OK, we can have \[createRawEncoder\]. Maybe it wouldn't be bad to have two shortcut methods
additionally, createRSRawEncoder and createXORRawEncoder, because the both are the primitive,
essential and most used ones in implementing advanced coders and HDFS side. I want the both
to be outstanding and easily used.

It seems better just to have one function, {{createRawEncoder}}, than to have lots of functions
for every type of encoder.

bq. I discussed about this with Uma Maheswara Rao G quite some ago when introducing these
factories. There isn't a clear way to compose or reduce the full class name of a raw coder
because it should be plugin-able and configurable. In current approach, for each codec, there
could be some coder implementations, and for each, the corresponding coder factory can be
configured.

We discussed this offline and I think the conclusion is that we don't need the factories for
anything.

We can just have a configuration key like {{erasure.coder.algorithm}} and then code like this:

{code}
RawErasureEncoder createRawEncoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Encoder";
  Constructor<RawErasureEncoder> ctor = 
    classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}

RawErasureDecoder createRawDecoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Decoder";
  Constructor<RawErasureEncoder> ctor = 
    classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}
{code}

bq. It seems this can simplify the related functions, but am not sure it would make the codes
more readable. The mentioned variables are very specific to encode/decode related calls using
on-heap bytebuffer or byte array buffers. Maybe DecodingState could be kept simple not putting
too many intermediate variables because the codes using of them are not suitable to be moved
to the class.

Reducing the number of function parameters from 8 or 9 to 1 or 2 seems like it would make
the code much more readable.  I don't understand what the rationale is for keeping these parameters
out of DecodingState.  Perhaps we could discuss this in a follow-on JIRA, though.

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch,
HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message