thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayan Moudgill <>
Subject Re: heterogeneous collections
Date Tue, 04 May 2010 23:56:57 GMT

Bryan Duxbury wrote:

> I don't mean to discourage you from thinking about more optimal structures,
> but man, this doesn't strike you as overcomplicated? I can tell you from
> experience that managing a stack of metadata (in the compact protocol) is
> non-trivial both in terms of complexity and performance - and that's just
> field IDs.

Not really; its as complicated as it needs to be to satisfy the goals of:
- allow fields to be present or absent
- allow for incremental encoding
- allow for a type-less representation.
It allows for a very efficient serializer and deserializer; in 
particular it may even make it possible to do in-place/zero-copy 
deserialization, something that is a little difficult to do with the 
types being interspersed with the data.

[And BTW - do you *REALLY* think this is complicated? I'd estimate that 
this is a few hundred lines of code at most - in fact, I'd call this 
pretty straightforward, if not trivial]

> Additionally, including the type information gives us more than just the
> ability to skip correctly. It also makes the serialized data fully
> described. 

That I agree - though its described at the duck-typing level, not the 
strong typing level.

> This makes it easy to debug it when you might think something is
> going wrong, or write a generic tool that can digest serialized Thrift for
> some reason.

Again, I agree - but that is NOT the reason that was given for the 
design with full-typing.

Instead, what was claimed was that it was done this way since "..using 
the type-identifier system keeps the TProtocol interface incredibly flat 
and obvious...". However, it should be obvious that even holding the 
TProtocol interface constant, one can have alternate serialization 
protocol that might yield better marshalling/demarshalling performance 
whilst sending less bits across the wire.

> The bottom line is that the way it works now, nothing is implied, which is
> probably suitable for most applications. There's certainly the possibility
> that for other applications, this doesn't make as much sense, so perhaps we
> should explore those ideas more fully, but *definitely* in another thread
> than this one.

One can get the same debuggability benefits by sending the type string 
out-of-band, either before or after the data. In a strongly typed 
system, in particular, the type string is a constant, so its zero cost 
to generate and there are no copies - it can be a direct argument to 
writev() [or its equivalent].

View raw message