drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Thrift?
Date Fri, 14 Sep 2012 21:38:15 GMT
Thanks for that Ted.

Correct - internal wire format doesnt mean 'drill only supports
protobuf encoded data'.

Part of the reason to favor protobuf is that a lot of people in the
broader 'big data' community are building a lot of experience with it.
 Hadoop and HBase both are moving to/moved to protobuf on the wire.
Being able to leverage this expertise is valuable.

There is a JIRA in Hadoop-land where someone had done a deep dive
'bake off' between thrift, protobuf and avro.  The ultimate choice was
protobuf for a number of reasons.  If people want to re-do the
analysis, I'd like to see it in the context of THAT analysis (eg: why
the assumptions there are not the same for Drill)... if anything it'd
give a concrete form to what can be a mire.

For what it's worth, I've had many discussion along these angles with
a variety of people including committers on Thrift, and the consensus
is both are good choices.


On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> I think that it is important to ask a few questions leading up a decision
> here.
> The first is a (rhetorical) show of hands about how many people believe
> that there are no serious performance or expressivity killers when
> comparing alternative serialization frameworks.  As far as I know,
> performance differences are not massive (and protobufs is one of the
> leaders in any case) and the expressivity differences are essentially nil.
>  If somebody feels that there is a serious show-stopper with any option,
> they should speak.
> The second is to ask the sense of the community whether they judge progress
> or perfection in this decision is most important to the project.  My guess
> is that almost everybody would prefer to see progress as long as the
> technical choice is not subject to some horrid missing bit.
> The final question is whether it is reasonable to go along with protobufs
> given that several very experienced engineers prefer it and would like to
> produce code based on it.  If the first two answers are answered to the
> effect of protobufs is about as good as we will find and that progress
> trumps small differences, then it seems that moving to follow this
> preference of Jason and Ryan for protobufs might be a reasonable thing to
> do.
> The question of an internal wire format, btw, does not constrain the
> project relative to external access.  I think it is important to support
> JDBC and ODBC and whatever is in common use for querying.  For external
> access the question is quite different.  Whereas for the internal format
> consensus around a single choice has large benefits, the external format
> choice is nearly the opposite.  For an external format, limiting ourselves
> to a single choice seems like a bad idea and increasing the audience seems
> like a better choice.
> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> Hi folks,
>> I just commented on this first JIRA.  Here is my text:
>> This issue has been hashed over a lot in the Hadoop projects. There
>> was work done to compare thrift vs avro vs protobuf. The conclusion
>> was protobuf was the decision to use.
>> Prior to this move, there had been a lot of noise about pluggable RPC
>> transports, and whatnot. It held up adoption of a backwards compatible
>> serialization framework for a long time. The problem ended up being
>> the analysis-paralysis, rather than the specific implementation
>> problem. In other words, the problem was a LACK of implementation than
>> actual REAL problems.
>> Based on this experience, I'd strongly suggest adopting protobuf and
>> moving on. Forget about pluggable RPC implementations, the complexity
>> doesnt deliver benefits. The benefits of protobuf is that its the RPC
>> format for Hadoop and HBase, which allows Drill to draw on the broad
>> experience of those communities who need to implement high performance
>> backwards compatible RPC serialization.
>> ====
>> Expanding a bit, I've looked in to this issue a lot, and there is very
>> few significant concrete reasons to choose protobuf vs thrift.  Tiny
>> percent faster of this, and that, etc.  I'd strongly suggest protobuf
>> for the expanded community.  There is no particular Apache imperative
>> that Apache projects re-use libraries.  Use what makes sense for your
>> project.
>> As regards to Avro, it's a fine serialization format for long term
>> data retention, but the complexities that exist to enable that make it
>> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
>> -ryan
>> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com>
>> wrote:
>> > We plan to propose the architecture and interfaces in the next couple
>> > weeks, which will make it easy to divide the project into clear building
>> > blocks. At that point it will be easier to start contributing different
>> > data sources, data formats, operators, query languages, etc.
>> >
>> > The contributions are done in the usual Apache way. It's best to open a
>> > JIRA and then post a patch so that others can review and then a committer
>> > can check it in.
>> >
>> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
>> chandanmadhesia@gmail.com
>> >> wrote:
>> >
>> >> Hi
>> >>
>> >> Hi
>> >>
>> >> What is the process to become a contributor to drill ?
>> >>
>> >> Regards
>> >> chandan
>> >>
>> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> >>
>> >> > Suffice it to say that if *you* think it is important enough to
>> implement
>> >> > and maintain, then the group shouldn't say naye.  The consensus stuff
>> >> > should only block things that break something else.  Additive features
>> >> that
>> >> > are highly maintainable (or which come with commitments) shouldn't
>> >> > generally be blocked.
>> >> >
>> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
>> >> > michael.hausenblas@gmail.com> wrote:
>> >> >
>> >> > > Good. Feel free to put me down for that, if the group as a whole
>> thinks
>> >> > > that (supporting Thrift) makes sense.
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Tomer Shiran
>> > Director of Product Management | MapR Technologies | 650-804-8657

View raw message