hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Tue, 24 May 2016 17:42:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298587#comment-15298587

Colin Patrick McCabe commented on HADOOP-13010:

It was nice talking to you, [~drankye].  It's too bad that we didn't have more time (it was
a busy week because I was going out of town).

bq. As I explained as above, \[the configuration-based\] approach might not work in all cases,
because: there are more than one codecs to be configured and for each of these codecs there
may be more than one coder implementation to be configured, and it's not easy to flatten the
two layers into one dimension (here you used algorithm).

I think these are really configuration questions, not questions about how the code should
be structured.  What does the user actually need to configure?  If the user just configures
a coder implementation, does that fully determine the codec which is being used?  If so, we
should have only one configuration knob-- coder.  If a coder could be used for multiple codecs,
then we need to have at least two knobs that the user can configure-- one for codec, and another
for coder.  Once we know what the configuration knobs are, we probably only need one or two
functions to create the objects we need based on a {{Configuration}} object, not a whole mess
of factory objects.

Anyway, let's talk about refactoring codec configuration and factories in a follow-on JIRA.
 I think we've made a lot of good progress here and it will helpful to get this patch committed.

> Refactor raw erasure coders
> ---------------------------
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch,
HADOOP-13010-v4.patch, HADOOP-13010-v5.patch, HADOOP-13010-v6.patch
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message