hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6883) Text.toString violates its abstraction
Date Tue, 27 Jul 2010 15:21:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892820#action_12892820

Owen O'Malley commented on HADOOP-6883:

I should also comment that I'm almost done with a patch that re-designs the generic serialization
framework that includes native support for ProtocolBuffers, Thrift, Avro, and Writables. I
already have all of the serializations working with both SequenceFiles and TFiles (with serialization
layered on top named OFiles).

> Text.toString violates its abstraction
> --------------------------------------
>                 Key: HADOOP-6883
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6883
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.20.1
>         Environment: Linux
>            Reporter: Gordon Sommers
> I stumbled upon this when encoding a google protocol buffer in base64, and storing it
in a Text object for serialization. Compare the following two lines:
> byte [] decoded = b64.decode(val.getBytes())
> //this does not return the same bytes as below and the result, after decoding the base64
successfully, is a very mangled protocol buffer
> byte [] decoded = b64.decode(val.toString().getBytes());
> //YES, toString() FIXES IT
> Elsewhere in my code I also have: 
> Text curline = new Text(values.next().toString());
> byte [] raw = base64.decode(curline.getBytes());
> //This does work.
> It looks like the Text object must be toString'd (just once, somewhere, even if its later
repacked in a Text) before it will have the proper byte representation. I would classify this
as a leaky abstraction and ask that the reason please be isolated and the api fixed somehow
so that other developers dont have to spend 3 days figuring out when Text.getBytes isn't returning
the right bytes even though Text.toString prints exactly the right string representation and
Text.toString.getBytes does return the right bytes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message