thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Randy Abernethy <randy.aberne...@gmail.com>
Subject Re: Binary data (de)serializing
Date Wed, 02 Oct 2013 17:55:43 GMT
Hello fogbit,

You mention .proto files, I take it you mean .thrift.

The binary type implies a sequential block of memory and is probably the
best way to go for your binary buffer. List does not imply sequential block
of memory in most environments, rather it implies a data structure (with
fast insert times if it is mutable). So I think from a semantic stand point
you really do want binary.

>From an implementation stand point I think binary is still the best choice.
The C++ generator uses std::string for binary types and Python uses string
as well. C++03 strings are almost always linear and in C++11 they must be
(like vector). An Apache Thrift list will give you a vector in C++ but in
Python you will get a list. Strings and vectors are effectively identical
in most C++ implementations but strings and lists are quite different in
Python.

Hope this helps.

Best,
Randy



On Wed, Oct 2, 2013 at 2:39 AM, fogbit <fogbit@gmail.com> wrote:

> Hi.
>
> I need to (de)serializing a buffer containing binary data, particularly
> between C++ (sender/receiver) and Python (receiver) applications. What data
> type should i use in a .proto file?
>
> There is already a question at StackOverlow
>
> http://stackoverflow.com/questions/13876233/best-way-to-send-binary-data-with-thrift
> and there is an advise to use "binary" type. But in the same time the
> manual says "This is currently a specialized form of the string type above,
> added to provide better interoperability with Java. The current
> plan-of-record is to elevate this to a base type at some point." Also from
> what i read from the generated sources, at C++/Python "binary data type" is
> transformed into "string" type, while "list<byte>" evaluates into
> std::vector<uint*_t>/[].
>
> Haven't checked it yet, but i have suspicions that in the general case a
> string deserialization could be slower than plain vector due to check of a
> locale or something.
>
> Also, in my opinion, it's "ideologically" right to use list of bytes when
> one wants to store... a list of bytes, what a binary buffer is really is,
> actually.
>
> So, what data type should i use? "binary" or "list<byte>".
>
> Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message