thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hesp <>
Subject Re: Diacritics get garbled when sent from Perl client.
Date Wed, 21 Jan 2015 09:56:35 GMT
Hi Jens,

Thanks for the quick reaction!

I totally agree with you (and with THRIFT-414 for that matter) that the 
wire format should always be UTF-8.
But, that's exactly what my Perl client is doing, I'm passing UTF-8 
characters but for some reason the writeString method in the 
BinaryProtocol package performs a encode_utf8 on the string
which, according to the Encode manual page:
- quote-
.... The characters that comprise $string are encoded in Perl's internal 
format and the result is returned as a sequence of octets.
- unquote -

And it does this after it has done a check on the string using 
utf8::is_utf8() which, according to the utf8 manual page:
- quote -
Test whether STRING is in UTF-8.
- unquote -

So, why is an encode done when the string is already in proper UTF-8?

Just out of pure curiosity I temporarily commented out the encode call 
from the writeString method and then everything works fine! But that is 
not a proper solution of course.

Kind regards,

On 01/21/2015 12:09 AM, Jens Geyer wrote:
> Hi Tom,
> I'm not exactly sure if I understand the issue correctly, but at least 
> I can say that the wire format of string shall be UTF-8. Anything else 
> is suspicios. See also 
> for a discussion of 
> the latter.
> Does that help you any further?
> Have fun,
> JensG
> -----Ursprüngliche Nachricht----- From: Tom Hesp
> Sent: Tuesday, January 20, 2015 10:19 AM
> To:
> Subject: Diacritics get garbled when sent from Perl client.
> Hi,
> This question may have been asked before on this list but I have not
> been able to find anything about it.
> I am using Thrift version 0.9.1 and have a C++ Thrift server maintaining
> user records in a database.
> When I send user information containing diacritics (like á, ö, è, etc.)
> to it from a C++ or PHP client everything is fine.
> However, when I do the same from a Perl client, the diacritics become
> garbled. The example characters above are received by the server as
> something like this: áöè
> I am using the BinaryProtocol so I checked the and saw
> the following construct in writeString:
>     if( utf8::is_utf8($value) ){
>         $value = Encode::encode_utf8($value);
>     }
> Which means that the string is encoded to Perl's internal format.
> I also checked the C++ libraries at the receiving (server) end but I do
> not see the string being decoded again!
> I even tried this with a little Perl server but the results are the
> same, the data gets encoded but is never decoded.
> Am I missing something? Do I need to define something in the IDL so the
> server knows it may have to decode the string?
> Thanks for your time.
> Kind regards,
> Tom Hesp
> -- 


*Tom Hesp *


*Office:* +31 (0)20 547 8409  | *Mobile:* +31 (0)6 538 95236
Stroombaan 6-8, 1181 VX  Amstelveen, The Netherlands <>_

View raw message