thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Kim <leo...@gmail.com>
Subject Re: fastbinary.c utf8 support
Date Fri, 03 Sep 2010 19:10:39 GMT
Okay, reading THRIFT-395 gives me an understanding of how fraught yet
rigorously discussed this issue has been. I'm wondering if it may be
worth underscoring this in the Thrift tutorial examples or other such
documentation.

On Fri, Sep 3, 2010 at 12:13 PM, David Reiss <dreiss@facebook.com> wrote:
>> Are users of fastbinary.c expected to use ASCII encoding exclusively?
> No, you can use any encoding you want.  Just pass str objects into
> Thrift, rather than Unicode objects.
>
> I wrote a patch to make it possible to *write* Unicode strings to fastbinary,
> just not read them.  It's at https://issues.apache.org/jira/secure/attachment/12404198/0003-THRIFT-395.-python-Phase-Two-of-support-for-unicode.patch
> Feel free to comment on that issue if you want that feature.
>
> --David
>
> On 09/03/2010 09:03 AM, Leo Kim wrote:
>> Are users of fastbinary.c expected to use ASCII encoding exclusively? The code generator
has the option to force UTF8 encoding/decoding for strings, so  one could argue that fastbinary.c
should have an analog version that works only on UTF8 encoded strings.
>>
>> On Sep 2, 2010, at 11:49 PM, David Reiss <dreiss@facebook.com> wrote:
>>
>>> We don't actually use the "UTF8" field type.  That should probably be
>>> removed entirely.  Unfortunately, there is currently no way for the
>>> accelerator module to determine whether the user wants a given value
>>> to be decoded into a unicode object or returned as a str.
>>>
>>> --David
>>>
>>> On 09/02/2010 07:26 PM, Leo Kim wrote:
>>>> Hello,
>>>>
>>>> I didn't see utf8 support in fastbinary.c in thrift-0.4.0, so I hacked
>>>> something in. I'm not a Python C API expert (nor a unicode expert),
>>>> but the attached patch appears to work when sending utf8 encoded
>>>> strings whereas without the patch I'd encounter the
>>>> "UnicodeDecodeError: 'ascii' codec can't decode byte ..." error.
>>>>
>>>> I offer it to this mailing list for review as I'm interested in
>>>> feedback regarding correctness and general interest in this patch.
>>>>
>>>> thx
>>>> leo
>>
>



-- 
-- My PGP public key can be found by exploring the following link:
-- http://preview.tinyurl.com/34ztb5
--

Mime
View raw message