thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chet Murthy <murthy.c...@gmail.com>
Subject Human-readable wire-format for Thrift?
Date Thu, 28 Sep 2017 01:04:56 GMT
[I hope I'm sending this mail to the right list -- it wasn't clear to me
that it should go to thrift-dev, so I figured I'd send it here first.]

The -one- thing that protobufs has going for it, over Thrift, is that
protobufs has "CompactTextFormat" (and JSON too) as full wire-formats.
This is .... incredibly useful for the following use-case:

You want to write a config-file format, and you want to get the benefits of
version-to-version compatibility.  In your program, you'd like to access a
strongly-typed "config object" with typed fields, and you'd -like- for
marshalling to/from flat-text to be automatically generated.

I have personal experience with using protobufs in exactly this way, and
it's really, really, really nice.

The current Thrift JSON protocol isn't designed for this, and given the
interface of the (C++) TProtocol class, I think it isn't possible.  But
with a small change, it -would- be possible, so I thought I'd describe the
change, and see what you all thought (b/c it would require a change to
generated code, and to the TProtocol base class interfaces (specifically to
the readFieldBegin method):

[I'll describe this for the C++ generated code; I haven't looked carefully
into the rest of the languages, but I'd guess that something could be done.]

(0) Let me first note that these datastructures are constant, and we're
talking about passing an extra parameter to the read method listed above.
That's it.

(1) For concreteness, imagine a couple of message types

struct Bar {
  4: required i32 a ,
  5: required string b,
}

struct Foo {
  1: required i32 a ,
  2: required string b,
  3: required Bar c,
}

Again for concreteness, here's an example of the JSON protoocol for a value
of type Foo:

{
    "1": {
        "i32": 1
    },
    "2": {
        "str": "ugh"
    },
    "3": {
        "rec": {
            "4": {
                "i32": 2
            },
            "5": {
                "str": "argh"
            }
        }
    }
}

(2) I'd prefer that that look like:
{
    "a": 1,
    "b": "ugh",
    "c": {
         "a": 2,
          "b": "argh"
    }
}

(3) For each message-type, we need a mapping field-name ->
pair<Thrift-type, field-id>.  So, generate a constant data-structure of type

map<string, pair<Type, int16_t> >

for each message-type.

(3) Marshalling is easy -- all the field-names are known, and we could just
emit those instead of field-ids; similarly, we could skip putting
type-information in the wire-format too.

(4) At demarshalling time, we always know the type of the message we're
demarshalling.  So as we read field-names, we can use the map in #3 to look
up TType and field-id, and then just demarshal in the normal way.  We just
need to pass that map as a constref to readFieldBegin.

I -think- that that works, and can't find any problems with what I've
described.

I can make this change to the C++ library and code-generator, but before I
start down that path, I figured I should get some input on whether this is
something that the Thrift community (and maintainers) would accept?

I think that a human-readable/writable wire would be immensely valuable,
and not just for the example of config-files.

Your feedback appreciated,
--chet--

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message