thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Geyer <jensge...@hotmail.com>
Subject Re: Human-readable wire-format for Thrift?
Date Thu, 28 Sep 2017 19:49:45 GMT
Hi Chet,

well, Thrift is primarily about efficiency, not human readability. If 
machines and programs talk to each other, nobody really needs human readable 
messages, because there are no humans involved, except maybe for debugging 
(but that's not a real production use case).  If one asked you to pick just 
one single feature about any Serialization and RPC library, potentially 
sacrificing any other requirement if needed, you probably would answer that 
it should be as fast and efficient as possible.

I only wonder if the human readability has sth to do with the fact that gRPC 
is often found being slower than Thrift ...  ;-)

You still want a human readable fomat? Ok, here's how to do it. Thrift 
indeed offers the ability to achieve that, because it is a framework. For 
example, look at the implementation of the TSimpleJSONProtocol (link below) 
and use this as a starting point to write your own JSON-like TProtocol 
implementation that suits your needs. That's what makes Thrift so flexible - 
even if you have special needs, you need to replace only those parts and it 
still simply works. If you prefer XML or some other format, even that should 
be feasible, but you have to invest some work either way.

https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/protocol/TSimpleJSONProtocol.java

Does that help you?

Have fun,
JensG


-----Urspr√ľngliche Nachricht----- 
From: Chet Murthy
Sent: Thursday, September 28, 2017 3:04 AM
To: user@thrift.apache.org
Subject: Human-readable wire-format for Thrift?

[I hope I'm sending this mail to the right list -- it wasn't clear to me
that it should go to thrift-dev, so I figured I'd send it here first.]

The -one- thing that protobufs has going for it, over Thrift, is that
protobufs has "CompactTextFormat" (and JSON too) as full wire-formats.
This is .... incredibly useful for the following use-case:

You want to write a config-file format, and you want to get the benefits of
version-to-version compatibility.  In your program, you'd like to access a
strongly-typed "config object" with typed fields, and you'd -like- for
marshalling to/from flat-text to be automatically generated.

I have personal experience with using protobufs in exactly this way, and
it's really, really, really nice.

The current Thrift JSON protocol isn't designed for this, and given the
interface of the (C++) TProtocol class, I think it isn't possible.  But
with a small change, it -would- be possible, so I thought I'd describe the
change, and see what you all thought (b/c it would require a change to
generated code, and to the TProtocol base class interfaces (specifically to
the readFieldBegin method):

[I'll describe this for the C++ generated code; I haven't looked carefully
into the rest of the languages, but I'd guess that something could be done.]

(0) Let me first note that these datastructures are constant, and we're
talking about passing an extra parameter to the read method listed above.
That's it.

(1) For concreteness, imagine a couple of message types

struct Bar {
  4: required i32 a ,
  5: required string b,
}

struct Foo {
  1: required i32 a ,
  2: required string b,
  3: required Bar c,
}

Again for concreteness, here's an example of the JSON protoocol for a value
of type Foo:

{
    "1": {
        "i32": 1
    },
    "2": {
        "str": "ugh"
    },
    "3": {
        "rec": {
            "4": {
                "i32": 2
            },
            "5": {
                "str": "argh"
            }
        }
    }
}

(2) I'd prefer that that look like:
{
    "a": 1,
    "b": "ugh",
    "c": {
         "a": 2,
          "b": "argh"
    }
}

(3) For each message-type, we need a mapping field-name ->
pair<Thrift-type, field-id>.  So, generate a constant data-structure of type

map<string, pair<Type, int16_t> >

for each message-type.

(3) Marshalling is easy -- all the field-names are known, and we could just
emit those instead of field-ids; similarly, we could skip putting
type-information in the wire-format too.

(4) At demarshalling time, we always know the type of the message we're
demarshalling.  So as we read field-names, we can use the map in #3 to look
up TType and field-id, and then just demarshal in the normal way.  We just
need to pass that map as a constref to readFieldBegin.

I -think- that that works, and can't find any problems with what I've
described.

I can make this change to the C++ library and code-generator, but before I
start down that path, I figured I should get some input on whether this is
something that the Thrift community (and maintainers) would accept?

I think that a human-readable/writable wire would be immensely valuable,
and not just for the example of config-files.

Your feedback appreciated,
--chet-- 

Mime
View raw message