hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Mon, 18 Apr 2016 13:02:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245612#comment-15245612

Kai Zheng commented on HADOOP-13010:

Thanks Colin.
bq.  It sounds like Encoder and Decoder will be stateless once they're created. Basically,
they just reflect an algorithm and some data required to implement that algorithm. That is
reasonable. We should document this in the JavaDoc for the classes.
Yeah, it's good to document this stateless property in JavaDoc. Note by the way it doesn't
mean these encoder/decoder are to support concurrency though it's possible. I would leave
this for future consideration. 

bq. The problem with calling this class DecoderState is that this name suggests that it is
the state of the coder, rather than state manipulated by the coder. Perhaps calling this DecoderData
or DecoderStream is more appropriate. Having this new class will also avoid the need to manually
pass around so many arrays and indices.
Ah yes the names (EncoderState/DecoderState) are bad, actually I meant them to be *EncodingState/DecodingState*.

bq. These methods and fields are configuration methods and configuration fields. They belong
in a class named something like "ErasureEncodingConfiguration" or something like that. ...Why
is having the configuration in a separate object better than having the configuration in a
base class?
Ok, I'm probably convinced by you. Thanks for the lots of insights. I got rid of the base
class anyway, and introduced *ErasureCoderConf* for the variables and methods in it. As you
might check the updated patch, there are some duplicate of small shortcuts between the encoder
base class and decoder base class as they now lack a common base. I suppose it's acceptable.

> Refactor raw erasure coders
> ---------------------------
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>             Fix For: 3.0.0
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.

This message was sent by Atlassian JIRA

View raw message