hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Sat, 13 Nov 2010 00:32:23 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931595#action_12931595

Doug Cutting commented on HADOOP-6685:

> That support is much easier if the metadata for each serialization is in a separate structure
and not dumped into the Configuration.

Got it.  Thanks for clarifying.  As I've commented earlier in this issue, I prefer the use
of simple textual formats (properties, XML, JSon, etc.) for metadata and configuration data,
as in HTTP, SMTP, and most config file formats, rather than binary data.  Such textual formats
seem to me to be more natural when bootstrapping interoperable systems.  Metadata and configuration
data are not usually performance or size sensitive, the normal motivation for the use of binary.

> Providing customer choice over the serialization is much richer than forcing them into
a single one.

I agree that we should provide a general-purpose API that does not force a particular serialization,
but we should encourage a primary serialization to provide better interoperability.

> Any file format that only supports one serialization doesn't meet my needs.

I certainly don't think we should mandate a single file format, and we don't at present. 
But I think we should focus our support around a single format.  A format that contains multiple
serializations is harder to support across multiple languages and greatly increases the chance
that you'll have data that cannot be processed by another system.  As an existence proof,
Google seems to get a lot of mileage with a single preferred serialization.

Thanks for responding to my concerns.  I am -0 on this patch as currently implemented: I think
we could do better but I will not block progress.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: serial.patch, SerializationAtSummit.pdf
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message