drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Thrift?
Date Fri, 14 Sep 2012 22:41:36 GMT
I didn't say I was the one making the argument...

Google has put probably > 10^24 bytes of data thru protobuf in
multiple implementations (eg: serialization on disk and on wire RPC).
  That is a low estimate.

I'd be interested in hearing what 20 years of telco protocol traffic
might compare to 10 years of google's usage of protobuf.  Exponential
curve and all of that.





On Fri, Sep 14, 2012 at 3:36 PM, Constantine Peresypkin
<pconstantine@gmail.com> wrote:
> More battle tested than more than 20 year old standard used almost in every
> telecom protocol that exists nowdays?
> I think your statement is a little on "too bold" side. :)
>
> On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
>> Funny thing, given how much use protobufs has been put thru, I think
>> one could make the argument its more battle tested than ASN.1 ...
>>
>> On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin
>> <constantine@litestack.com> wrote:
>> > Protobuf is an attempt to make ASN.1 more developer friendly (not a bad
>> > attempt).
>> > It's simpler, has much less features, easier to implement and has a
>> compact
>> > encoding.
>> > But on other hand it's non-standard, "reinvented wheel" they could just
>> do
>> > a "better than PER" encoding for ASN.1, and AFAIK has no support for the
>> > new and shiny Google encodings, like "group varint".
>> > All in all in current situation it seems a better choice than ASN.1, not
>> > even arguing about something even more vague and non-standard as Thrift.
>> >
>> > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <ryanobjc@gmail.com>
>> wrote:
>> >
>> >> Thanks for that Ted.
>> >>
>> >> Correct - internal wire format doesnt mean 'drill only supports
>> >> protobuf encoded data'.
>> >>
>> >> Part of the reason to favor protobuf is that a lot of people in the
>> >> broader 'big data' community are building a lot of experience with it.
>> >>  Hadoop and HBase both are moving to/moved to protobuf on the wire.
>> >> Being able to leverage this expertise is valuable.
>> >>
>> >> There is a JIRA in Hadoop-land where someone had done a deep dive
>> >> 'bake off' between thrift, protobuf and avro.  The ultimate choice was
>> >> protobuf for a number of reasons.  If people want to re-do the
>> >> analysis, I'd like to see it in the context of THAT analysis (eg: why
>> >> the assumptions there are not the same for Drill)... if anything it'd
>> >> give a concrete form to what can be a mire.
>> >>
>> >> For what it's worth, I've had many discussion along these angles with
>> >> a variety of people including committers on Thrift, and the consensus
>> >> is both are good choices.
>> >>
>> >> -ryan
>> >>
>> >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <ted.dunning@gmail.com>
>> >> wrote:
>> >> > I think that it is important to ask a few questions leading up a
>> decision
>> >> > here.
>> >> >
>> >> > The first is a (rhetorical) show of hands about how many people
>> believe
>> >> > that there are no serious performance or expressivity killers when
>> >> > comparing alternative serialization frameworks.  As far as I know,
>> >> > performance differences are not massive (and protobufs is one of the
>> >> > leaders in any case) and the expressivity differences are essentially
>> >> nil.
>> >> >  If somebody feels that there is a serious show-stopper with any
>> option,
>> >> > they should speak.
>> >> >
>> >> > The second is to ask the sense of the community whether they judge
>> >> progress
>> >> > or perfection in this decision is most important to the project.  My
>> >> guess
>> >> > is that almost everybody would prefer to see progress as long as the
>> >> > technical choice is not subject to some horrid missing bit.
>> >> >
>> >> > The final question is whether it is reasonable to go along with
>> protobufs
>> >> > given that several very experienced engineers prefer it and would
>> like to
>> >> > produce code based on it.  If the first two answers are answered to
>> the
>> >> > effect of protobufs is about as good as we will find and that progress
>> >> > trumps small differences, then it seems that moving to follow this
>> >> > preference of Jason and Ryan for protobufs might be a reasonable
>> thing to
>> >> > do.
>> >> >
>> >> > The question of an internal wire format, btw, does not constrain the
>> >> > project relative to external access.  I think it is important to
>> support
>> >> > JDBC and ODBC and whatever is in common use for querying.  For
>> external
>> >> > access the question is quite different.  Whereas for the internal
>> format
>> >> > consensus around a single choice has large benefits, the external
>> format
>> >> > choice is nearly the opposite.  For an external format, limiting
>> >> ourselves
>> >> > to a single choice seems like a bad idea and increasing the audience
>> >> seems
>> >> > like a better choice.
>> >> >
>> >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi folks,
>> >> >>
>> >> >> I just commented on this first JIRA.  Here is my text:
>> >> >>
>> >> >> This issue has been hashed over a lot in the Hadoop projects. There
>> >> >> was work done to compare thrift vs avro vs protobuf. The conclusion
>> >> >> was protobuf was the decision to use.
>> >> >>
>> >> >> Prior to this move, there had been a lot of noise about pluggable
RPC
>> >> >> transports, and whatnot. It held up adoption of a backwards
>> compatible
>> >> >> serialization framework for a long time. The problem ended up being
>> >> >> the analysis-paralysis, rather than the specific implementation
>> >> >> problem. In other words, the problem was a LACK of implementation
>> than
>> >> >> actual REAL problems.
>> >> >>
>> >> >> Based on this experience, I'd strongly suggest adopting protobuf
and
>> >> >> moving on. Forget about pluggable RPC implementations, the complexity
>> >> >> doesnt deliver benefits. The benefits of protobuf is that its the
RPC
>> >> >> format for Hadoop and HBase, which allows Drill to draw on the
broad
>> >> >> experience of those communities who need to implement high
>> performance
>> >> >> backwards compatible RPC serialization.
>> >> >>
>> >> >> ====
>> >> >>
>> >> >> Expanding a bit, I've looked in to this issue a lot, and there
is
>> very
>> >> >> few significant concrete reasons to choose protobuf vs thrift.
 Tiny
>> >> >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
>> >> >> for the expanded community.  There is no particular Apache imperative
>> >> >> that Apache projects re-use libraries.  Use what makes sense for
your
>> >> >> project.
>> >> >>
>> >> >> As regards to Avro, it's a fine serialization format for long term
>> >> >> data retention, but the complexities that exist to enable that
make
>> it
>> >> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any
form.
>> >> >>
>> >> >> -ryan
>> >> >>
>> >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com>
>> >> >> wrote:
>> >> >> > We plan to propose the architecture and interfaces in the
next
>> couple
>> >> >> > weeks, which will make it easy to divide the project into
clear
>> >> building
>> >> >> > blocks. At that point it will be easier to start contributing
>> >> different
>> >> >> > data sources, data formats, operators, query languages, etc.
>> >> >> >
>> >> >> > The contributions are done in the usual Apache way. It's best
to
>> open
>> >> a
>> >> >> > JIRA and then post a patch so that others can review and then
a
>> >> committer
>> >> >> > can check it in.
>> >> >> >
>> >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
>> >> >> chandanmadhesia@gmail.com
>> >> >> >> wrote:
>> >> >> >
>> >> >> >> Hi
>> >> >> >>
>> >> >> >> Hi
>> >> >> >>
>> >> >> >> What is the process to become a contributor to drill ?
>> >> >> >>
>> >> >> >> Regards
>> >> >> >> chandan
>> >> >> >>
>> >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
>> ted.dunning@gmail.com>
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Suffice it to say that if *you* think it is important
enough to
>> >> >> implement
>> >> >> >> > and maintain, then the group shouldn't say naye.
 The consensus
>> >> stuff
>> >> >> >> > should only block things that break something else.
 Additive
>> >> features
>> >> >> >> that
>> >> >> >> > are highly maintainable (or which come with commitments)
>> shouldn't
>> >> >> >> > generally be blocked.
>> >> >> >> >
>> >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas
<
>> >> >> >> > michael.hausenblas@gmail.com> wrote:
>> >> >> >> >
>> >> >> >> > > Good. Feel free to put me down for that, if
the group as a
>> whole
>> >> >> thinks
>> >> >> >> > > that (supporting Thrift) makes sense.
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Tomer Shiran
>> >> >> > Director of Product Management | MapR Technologies | 650-804-8657
>> >> >>
>> >>
>>

Mime
View raw message