thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Slee <ms...@facebook.com>
Subject RE: heterogeneous collections
Date Mon, 03 May 2010 22:12:09 GMT
>> If, however, you're encoding the data for demarshalling at the server, 
>> it sounds like you want a different RPC framework.

I'm going to slightly hijack the conversation to wax philosophic for a minute here. I think
this statement roughly captures my sentiment here.

One of Thrift's early goals was basically to do just one thing, but do it very simply and
efficiently across lots of platforms. That thing is *strongly-typed* RPC and data-serialization.
All of the components were essentially designed under the assumption that they would always
be strongly-typed, and that they should always map to something efficient and obvious in a
language like C++.

Now, a lot of the things Thrift does are very *similar* to other sorts of interesting mechanisms
data-serialization, marshalling, containering, and whatnot. I think it can be very tempting
to look at these similarities, analyze the distance between the two things, and decide since
that distance looks pretty crossable, so we should just build a bridge to connect the two.

My fear is that in the long run this turns a small, neat, island into a complicated mess of
bridges. If you find the right viewing angle and it's not a foggy day, you can sometimes still
see the little island underneath the bridges, but this Thrift thing definitely looks like
bridgework, not an island.

In the long term, my personal bias is that this is bad for Thrift. Most people interested
in building these features need them to solve specific problems and only care about one or
two target languages. If we do a lot of this, we end up with a patchwork set of variable feature-lists
that are inconsistent across languages. The Thrift "brand" will invariably move away from
"simple, lightweight, lets you do the same thing in all programming languages" towards "a
bit complicated, does some things in some languages."

Part of the idea of Thrift's modular transport/protocol design was that it would make it easy
for people to implement custom extensions/modifications to the system *outside of the core
project.* Want to sub in your own weird encoding/transport/whatever? No problem, just write
a TProtocol. Think other folks will be into it? Cool, post it online and send an email to
the thrift-user@ list. Turns out lots of folks want to use it? Then maybe we should incorporate
it.

For better or worse, I really think simple things like "how many source files appear to be
in this tarball?" can matter a lot for software adoption. Even if a project is just 10 easy-to-read
files at its core, when you have to locate those 10 files amongst 40 files of extensions and
add-ons, and the default make configuration builds everything, the project starts feeling
like a complicated, awkward thing to deal with, and us engineers start getting that itchy
feeling of "I can't possibly understand this entire thing, surely it is too complicated and
slow, why don't we just write our own from scratch."

I don't expect everyone to agree with this, and the direction of the project is ultimately
at the behest of the developers most actively working on it, but when it comes to things like
dynamic or heterogeneous containers, my opinion is that they just shouldn't be a core part
of a strongly-typed software project with stated simplicity goals.

Cheers,
Mark

-----Original Message-----
From: Mayan Moudgill [mailto:mayan@bestweb.net] 
Sent: Monday, May 03, 2010 10:03 AM
To: thrift-user@incubator.apache.org; alex@bizo.com
Subject: Re: heterogeneous collections


The idea of marshalling to strings seems somewhat counter-productive; 
after all, you're marshalling the data using Thrift, which then gets 
sent to a server, and demarshalls it. Now, on top of that you're adding 
another layer of marshalling.

A similar thing happens in  Cassandra (except that they use binary 
instead of strings), but at least at Cassandra the user-marshalled data 
is uninterpreted at the server - it only handles the data as an 
uninterpreted blob, so the marshalling/demarshalling is only confined to 
the client [I still wonder about how version control is managed - does 
everyone end up rolling their own?]

If, however, you're encoding the data for demarshalling at the server, 
it sounds like you want a different RPC framework. For instance, do you 
really need the version flexibility that is provided by Thrift? Are your 
types fixed at source & destination? Do you need a leaner transport? In 
fact, why did you pick Thrift in the first place?

Apropos the discussion on scalar/string compression in the 
https://issues.apache.org/jira/browse/THRIFT-110
I'm curious: if a particular application would tend to compress better 
using a different algo than the one(s) provided, what happens?

> On Mon, May 3, 2010 at 7:09 AM, Bryan Duxbury <bryan@rapleaf.com> wrote:
> 
> 
>>There is already a totally viable workaround, though - make a Union of the
>>types you want in your collection, and then make the field list<YourUnion>.
>>You get basically all the capabilities with very few drawbacks, plus the
>>ability to include multiple logical "types" in the collection, not just
>>physical types. Of course, if you literally need "any" possible object to
>>go
>>into the collection, then this won't do it for you.
>>
> 
> 
> Thanks for the suggestion, Bryan.
> 
> I'm experimenting with marshalling my values to strings (I only deal with
> basic types such as int32, int64, strings) right now.   If that doesn't
> work, I'll go with your suggestion.
> 
> alex
> 


Mime
View raw message