hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13010) Refactor raw erasure coders
Date Wed, 18 May 2016 18:36:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289544#comment-15289544
] 

Kai Zheng commented on HADOOP-13010:
------------------------------------

Hi Colin,

Thanks for the comments. About the factories, I have to clarify the real problem in details
and hope this works since the f2f discussion isn't going into details due to time constraint.

We may have the following codecs in the 1st level:
rs-legacy, rs-default (both belonging to RS)
xor,
hh or hitchhiker,
lrc,
...

And for each codec, it may use one or more raw coders, but each of such coders may use different
implementations. For example, for the rs-default codec, we have two coder implementations
(the pure java one and the isa-l one). Users may add their own coder implementation for a
codec, maybe for better performance.

So that's why I would have a configuration key like this:
o.a.h.io.erasurecode.codec.(codec-name).rawcoder: (whatever value to be used to create or
load the coder).

Currently we configured the factory to create the encoder and decoder for a coder implementation,
I agree there could be better option here, and while discussing about this in details with
Andrew yesterday in the SF office, wonder if we could achieve the effect avoding the factories
using java service loader.

First, we can add codec-name and coder-name to the raw coder, so each coder will have a codec-name
and coder-name when it's created.

Then we have the built-in coders of fixed codec-name and coder-name. Customized coders will
be loaded via service loader.

Eventually we will have all the raw erasure coders loaded and created, then we can setup a
mapping between codec-name and coder-name, coder-name and the coder-class or instance.

Does this sound good to you? If it works, then we might do this in a follow-on task?

Thanks again!


> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch,
HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class
inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder
to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the
moment and also incurs big impact. I do wish the end result by this refactoring will make
all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message