hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Fri, 03 Dec 2010 20:52:22 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966667#action_12966667

Doug Cutting commented on HADOOP-6685:

> We don't have one right now. We have XML and JSON. Neither is user-friendly.

We don't use currently use JSON for configuration data.  Today we use Map<String,String>
as the configuration data model.  This is usually serialized as XML and sometimes in other
forms (e.g., inside a SequenceFile).  The simplicity of this model permits differing serializations
without significant loss of transparency or interoperability.  This model interoperates well
with Java properties, including system properties, with environment variables, etc.  Appending
a prefix to keys has been demonstrated to be an effective if inelegant way to implement nesting
in this model.  This model does not easily map to objects, nor does it provide any type support.

If we wish to use a more complex data model, that's nestable, that's more strongly typed and
that can be easily mapped to objects, then a standard serialization, like JSON or YAML, is
a good way to still ensure transparency and interoperability.

YAML could work well as a data model.  Nesting YAML requires adjusting indentation, while
JSON permits simple string appends to nest.  But if a Java API like YamlBeans is used, then
indentation would be handled automatically.

If we can read/write YAML, what reason is there to support arbitrary binary configuration

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message