hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Thu, 05 May 2016 19:43:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272936#comment-15272936
] 

Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------

Thanks, guys.  Sorry for the delay in reviewing.  We've been busy.

{{CodecUtil.java}}: there are a LOT of functions here for creating {{RawErasureEncoder}} objects.
We've got:
{code}
createRSRawEncoder(Configuration conf, int numDataUnits, int numParityUnits, String codec)
createRSRawEncoder(Configuration conf, int numDataUnit, int numParityUnit)
createRSRawEncoder(Configuration conf, String codec, ErasureCoderOptions coderOptions)
createRSRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, int numDataUnits, int numParityUnits)
createRawEncoder(Configuration conf, String rawCoderFactoryKey, ErasureCoderOptions coderOptions)
{code}

Plus a similar number of functions for creating decoders.  Why do we have to have so many
functions?  Surely the codec, numParityUnits, numDataUnits, whether it is XOR or not, etc.
etc. should just be included in ErasureCoderOptions.  Then we could just have one function:
{code}
createRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
{code}

On a related note, why does each particular type of encoder need its own factory?  It seems
like we just need a static function for each encoder type that takes a Configuration and ErasureCoderOptions,
and we're good to go.  We can locate these static functions via reflection.

{code}
  protected void doDecode(DecodingState decodingState, byte[][] inputs,
                          int[] inputOffsets, int[] erasedIndexes,
                          byte[][] outputs, int[] outputOffsets) {
{code}
Can we just include the inputs, inputOffsets, erasedIndexes, outputs, outputOffsets in {{DecodingState}}?

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch,
HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message