hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map<String,String> for configuration
Date Fri, 19 Nov 2010 22:06:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934009#action_12934009

Arun C Murthy commented on HADOOP-6685:

> So, the patch, as it stands allows SequenceFiles to use the new serialization framework
i.e. adds a feature. Are you against this feature? Can you please explain why?

Yes, I am against this feature. I've explained why several times above, and will try again
now. Creating new concrete data formats that are functionally equivalent to other concrete
formats decreases ecosystem interoperability, flexibility and maintainability. Above I cited
the Dremel paper, whose section 2 outlines a scenario that they argue is only possible because
all of the systems involved share a single common serialization and file format.

Thanks for laying it out again.

Your objections seem very unreasonable to me. 

I understand you prefer to have a single Avro-based data format, but Hadoop is a software
framework used by many people and organizations. People and organizations already have data
in different formats. 

Dremel is implemented and used by a single organization who have specific a technical and
historical context. What they need and use isn't something everyone on the planet can. 

Hadoop as a framework should not be in the business of dictating formats. 

We should facilitate and encourage users and organizations to use inter-operable formats,
not necessarily the *one* format.

IAC, this seems like a discussion which belongs elsewhere - I just don't see how blocking
a feature is useful. You can ask for it to be done in a separate jira, which we can, but this
specific objection of yours is very unreasonable, IMO.

> Change the generic serialization framework API to use serialization-specific bytes instead
of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>         Attachments: libthrift.jar, serial.patch, serial4.patch, serial6.patch, serial7.patch,
> Currently, the generic serialization framework uses Map<String,String> for the
serialization specific configuration. Since this data is really internal to the specific serialization,
I think we should change it to be an opaque binary blob. This will simplify the interface
for defining specific serializations for different contexts (MAPREDUCE-1462). It will also
move us toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message