drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Zhou <coderp...@gmail.com>
Subject Re: Thrift?
Date Sat, 15 Sep 2012 11:39:11 GMT
There should be 2 types of serialization method. One should define its
schema,
for the use of RPC,  user wire API; while the other need not define schema,
it
typically for internal data transfer, I think fastjson or kryo is quite
suitable for the
latter purpose.


Regards,
Min

On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
michael.hausenblas@gmail.com> wrote:

>
> Point taken … +1 for protobuf - from my POV we can close ISSUE-1
>
> > The question of an internal wire format, btw, does not constrain the
> project relative to external access.
>
> Sounds sensible.
>
> The only one thing I really don't get is: why did you put Avro and JSON
> into the proposal [1] in the first place? Or is this the 'external access'
> from above?
>
> Cheers,
>            Michael
>
> [1] http://wiki.apache.org/incubator/DrillProposal
>
> --
> Michael Hausenblas
> Ireland, Europe
> http://mhausenblas.info/
>
> On 14 Sep 2012, at 22:31, Ted Dunning wrote:
>
> > I think that it is important to ask a few questions leading up a decision
> > here.
> >
> > The first is a (rhetorical) show of hands about how many people believe
> > that there are no serious performance or expressivity killers when
> > comparing alternative serialization frameworks.  As far as I know,
> > performance differences are not massive (and protobufs is one of the
> > leaders in any case) and the expressivity differences are essentially
> nil.
> > If somebody feels that there is a serious show-stopper with any option,
> > they should speak.
> >
> > The second is to ask the sense of the community whether they judge
> progress
> > or perfection in this decision is most important to the project.  My
> guess
> > is that almost everybody would prefer to see progress as long as the
> > technical choice is not subject to some horrid missing bit.
> >
> > The final question is whether it is reasonable to go along with protobufs
> > given that several very experienced engineers prefer it and would like to
> > produce code based on it.  If the first two answers are answered to the
> > effect of protobufs is about as good as we will find and that progress
> > trumps small differences, then it seems that moving to follow this
> > preference of Jason and Ryan for protobufs might be a reasonable thing to
> > do.
> >
> > The question of an internal wire format, btw, does not constrain the
> > project relative to external access.  I think it is important to support
> > JDBC and ODBC and whatever is in common use for querying.  For external
> > access the question is quite different.  Whereas for the internal format
> > consensus around a single choice has large benefits, the external format
> > choice is nearly the opposite.  For an external format, limiting
> ourselves
> > to a single choice seems like a bad idea and increasing the audience
> seems
> > like a better choice.
> >
> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >
> >> Hi folks,
> >>
> >> I just commented on this first JIRA.  Here is my text:
> >>
> >> This issue has been hashed over a lot in the Hadoop projects. There
> >> was work done to compare thrift vs avro vs protobuf. The conclusion
> >> was protobuf was the decision to use.
> >>
> >> Prior to this move, there had been a lot of noise about pluggable RPC
> >> transports, and whatnot. It held up adoption of a backwards compatible
> >> serialization framework for a long time. The problem ended up being
> >> the analysis-paralysis, rather than the specific implementation
> >> problem. In other words, the problem was a LACK of implementation than
> >> actual REAL problems.
> >>
> >> Based on this experience, I'd strongly suggest adopting protobuf and
> >> moving on. Forget about pluggable RPC implementations, the complexity
> >> doesnt deliver benefits. The benefits of protobuf is that its the RPC
> >> format for Hadoop and HBase, which allows Drill to draw on the broad
> >> experience of those communities who need to implement high performance
> >> backwards compatible RPC serialization.
> >>
> >> ====
> >>
> >> Expanding a bit, I've looked in to this issue a lot, and there is very
> >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
> >> percent faster of this, and that, etc.  I'd strongly suggest protobuf
> >> for the expanded community.  There is no particular Apache imperative
> >> that Apache projects re-use libraries.  Use what makes sense for your
> >> project.
> >>
> >> As regards to Avro, it's a fine serialization format for long term
> >> data retention, but the complexities that exist to enable that make it
> >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any form.
> >>
> >> -ryan
> >>
> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com>
> >> wrote:
> >>> We plan to propose the architecture and interfaces in the next couple
> >>> weeks, which will make it easy to divide the project into clear
> building
> >>> blocks. At that point it will be easier to start contributing different
> >>> data sources, data formats, operators, query languages, etc.
> >>>
> >>> The contributions are done in the usual Apache way. It's best to open a
> >>> JIRA and then post a patch so that others can review and then a
> committer
> >>> can check it in.
> >>>
> >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> >> chandanmadhesia@gmail.com
> >>>> wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> Hi
> >>>>
> >>>> What is the process to become a contributor to drill ?
> >>>>
> >>>> Regards
> >>>> chandan
> >>>>
> >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >>>>
> >>>>> Suffice it to say that if *you* think it is important enough to
> >> implement
> >>>>> and maintain, then the group shouldn't say naye.  The consensus
stuff
> >>>>> should only block things that break something else.  Additive
> features
> >>>> that
> >>>>> are highly maintainable (or which come with commitments) shouldn't
> >>>>> generally be blocked.
> >>>>>
> >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> >>>>> michael.hausenblas@gmail.com> wrote:
> >>>>>
> >>>>>> Good. Feel free to put me down for that, if the group as a whole
> >> thinks
> >>>>>> that (supporting Thrift) makes sense.
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Tomer Shiran
> >>> Director of Product Management | MapR Technologies | 650-804-8657
> >>
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message