hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Sun, 08 May 2016 12:37:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275583#comment-15275583
] 

Kai Zheng commented on HADOOP-13010:
------------------------------------

bq. Sorry for the delay in reviewing. We've been busy.
Actually I'm sorry here for the pushing. It's understood because we're open source and sometimes
out of sync. Even I wish to get all these refactoring and the following ISA-L coder be committed
prior to the first 3.0 cut, I myself got pretty busy with other things recently so this comment
is so late. :)

bq. Why do we have to have so many functions?
Well, if we don't have to feel afraid this change becomes too big and bump into HDFS said,
I thought we can get rid of some ones.

bq. Surely the codec, numParityUnits, numDataUnits, whether it is XOR or not, etc. etc. should
just be included in ErasureCoderOptions.
I agree it's a good idea to merge schema related info like numParityUnits and numDataUnits
into {{ErasureCoderOptions}} from a fresh point of view looking at all of this. Not sure if
it's good to have something like {{isXOR}} or {{isRS}}, because we'll have more coder algorithms
other than the current both.

bq. Then we could just have one function ... {{createRawEncoder}}
OK, we can have it. Maybe it wouldn't be bad to have two shortcut methods additionally, createRSRawEncoder
and createXORRawEncoder, because the both are the primitive, essential and most used ones
in implementing advanced coders and HDFS side. I want the both to be outstanding and easily
used.

bq. why does each particular type of encoder need its own factory? It seems like we just need
a static function for each encoder type that takes a Configuration and ErasureCoderOptions,
and we're good to go. We can locate these static functions via reflection.
I discussed about this with [~umamaheswararao] quite some ago when introducing these factories.
There isn't a clear way to compose or reduce the full class name of a raw coder because it
should be plugin-able and configurable. In current approach, for each codec, there could be
some coder implementations, and for each, the corresponding coder factory can be configured.

bq. Can we just include the inputs, inputOffsets, erasedIndexes, outputs, outputOffsets in
DecodingState?
It seems this can simplify the related functions, but am not sure it would make the codes
more readable. The mentioned variables are very specific to encode/decode related calls using
on-heap bytebuffer or byte array buffers. Maybe DecodingState could be kept simple not putting
too many intermediate variables because the codes using of them are not suitable to be moved
to the class.

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch,
HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message