drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Zhou <coderp...@gmail.com>
Subject Re: Thrift?
Date Sat, 15 Sep 2012 12:26:54 GMT
YarnRPC -1

That's quite inefficient in my experience and doesn't support
multi-languages
currently.

On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <hyunsik.choi@gmail.com>wrote:

> +1 for json as initial data format
>
> In addition, I recommend YarnRPC with protocol buffer for internal RPC and
> API RPC. Protocol buffer is portable to other languages. If we use another
> RPC system, we have to additionally consider the security aspect of Hadoop.
>
> --
> Hyunsik Choi
>
> On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <coderplay@gmail.com> wrote:
>
> > There should be 2 types of serialization method. One should define its
> > schema,
> > for the use of RPC,  user wire API; while the other need not define
> schema,
> > it
> > typically for internal data transfer, I think fastjson or kryo is quite
> > suitable for the
> > latter purpose.
> >
> >
> > Regards,
> > Min
> >
> > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
> > michael.hausenblas@gmail.com> wrote:
> >
> > >
> > > Point taken … +1 for protobuf - from my POV we can close ISSUE-1
> > >
> > > > The question of an internal wire format, btw, does not constrain the
> > > project relative to external access.
> > >
> > > Sounds sensible.
> > >
> > > The only one thing I really don't get is: why did you put Avro and JSON
> > > into the proposal [1] in the first place? Or is this the 'external
> > access'
> > > from above?
> > >
> > > Cheers,
> > >            Michael
> > >
> > > [1] http://wiki.apache.org/incubator/DrillProposal
> > >
> > > --
> > > Michael Hausenblas
> > > Ireland, Europe
> > > http://mhausenblas.info/
> > >
> > > On 14 Sep 2012, at 22:31, Ted Dunning wrote:
> > >
> > > > I think that it is important to ask a few questions leading up a
> > decision
> > > > here.
> > > >
> > > > The first is a (rhetorical) show of hands about how many people
> believe
> > > > that there are no serious performance or expressivity killers when
> > > > comparing alternative serialization frameworks.  As far as I know,
> > > > performance differences are not massive (and protobufs is one of the
> > > > leaders in any case) and the expressivity differences are essentially
> > > nil.
> > > > If somebody feels that there is a serious show-stopper with any
> option,
> > > > they should speak.
> > > >
> > > > The second is to ask the sense of the community whether they judge
> > > progress
> > > > or perfection in this decision is most important to the project.  My
> > > guess
> > > > is that almost everybody would prefer to see progress as long as the
> > > > technical choice is not subject to some horrid missing bit.
> > > >
> > > > The final question is whether it is reasonable to go along with
> > protobufs
> > > > given that several very experienced engineers prefer it and would
> like
> > to
> > > > produce code based on it.  If the first two answers are answered to
> the
> > > > effect of protobufs is about as good as we will find and that
> progress
> > > > trumps small differences, then it seems that moving to follow this
> > > > preference of Jason and Ryan for protobufs might be a reasonable
> thing
> > to
> > > > do.
> > > >
> > > > The question of an internal wire format, btw, does not constrain the
> > > > project relative to external access.  I think it is important to
> > support
> > > > JDBC and ODBC and whatever is in common use for querying.  For
> external
> > > > access the question is quite different.  Whereas for the internal
> > format
> > > > consensus around a single choice has large benefits, the external
> > format
> > > > choice is nearly the opposite.  For an external format, limiting
> > > ourselves
> > > > to a single choice seems like a bad idea and increasing the audience
> > > seems
> > > > like a better choice.
> > > >
> > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
> > > wrote:
> > > >
> > > >> Hi folks,
> > > >>
> > > >> I just commented on this first JIRA.  Here is my text:
> > > >>
> > > >> This issue has been hashed over a lot in the Hadoop projects. There
> > > >> was work done to compare thrift vs avro vs protobuf. The conclusion
> > > >> was protobuf was the decision to use.
> > > >>
> > > >> Prior to this move, there had been a lot of noise about pluggable
> RPC
> > > >> transports, and whatnot. It held up adoption of a backwards
> compatible
> > > >> serialization framework for a long time. The problem ended up being
> > > >> the analysis-paralysis, rather than the specific implementation
> > > >> problem. In other words, the problem was a LACK of implementation
> than
> > > >> actual REAL problems.
> > > >>
> > > >> Based on this experience, I'd strongly suggest adopting protobuf and
> > > >> moving on. Forget about pluggable RPC implementations, the
> complexity
> > > >> doesnt deliver benefits. The benefits of protobuf is that its the
> RPC
> > > >> format for Hadoop and HBase, which allows Drill to draw on the broad
> > > >> experience of those communities who need to implement high
> performance
> > > >> backwards compatible RPC serialization.
> > > >>
> > > >> ====
> > > >>
> > > >> Expanding a bit, I've looked in to this issue a lot, and there is
> very
> > > >> few significant concrete reasons to choose protobuf vs thrift.  Tiny
> > > >> percent faster of this, and that, etc.  I'd strongly suggest
> protobuf
> > > >> for the expanded community.  There is no particular Apache
> imperative
> > > >> that Apache projects re-use libraries.  Use what makes sense for
> your
> > > >> project.
> > > >>
> > > >> As regards to Avro, it's a fine serialization format for long term
> > > >> data retention, but the complexities that exist to enable that make
> it
> > > >> non-ideal for an RPC.  I know of no one who uses AvroRPC in any
> form.
> > > >>
> > > >> -ryan
> > > >>
> > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <tshiran@maprtech.com
> >
> > > >> wrote:
> > > >>> We plan to propose the architecture and interfaces in the next
> couple
> > > >>> weeks, which will make it easy to divide the project into clear
> > > building
> > > >>> blocks. At that point it will be easier to start contributing
> > different
> > > >>> data sources, data formats, operators, query languages, etc.
> > > >>>
> > > >>> The contributions are done in the usual Apache way. It's best
to
> > open a
> > > >>> JIRA and then post a patch so that others can review and then
a
> > > committer
> > > >>> can check it in.
> > > >>>
> > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
> > > >> chandanmadhesia@gmail.com
> > > >>>> wrote:
> > > >>>
> > > >>>> Hi
> > > >>>>
> > > >>>> Hi
> > > >>>>
> > > >>>> What is the process to become a contributor to drill ?
> > > >>>>
> > > >>>> Regards
> > > >>>> chandan
> > > >>>>
> > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
> ted.dunning@gmail.com>
> > > >> wrote:
> > > >>>>
> > > >>>>> Suffice it to say that if *you* think it is important
enough to
> > > >> implement
> > > >>>>> and maintain, then the group shouldn't say naye.  The
consensus
> > stuff
> > > >>>>> should only block things that break something else.  Additive
> > > features
> > > >>>> that
> > > >>>>> are highly maintainable (or which come with commitments)
> shouldn't
> > > >>>>> generally be blocked.
> > > >>>>>
> > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas <
> > > >>>>> michael.hausenblas@gmail.com> wrote:
> > > >>>>>
> > > >>>>>> Good. Feel free to put me down for that, if the group
as a whole
> > > >> thinks
> > > >>>>>> that (supporting Thrift) makes sense.
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Tomer Shiran
> > > >>> Director of Product Management | MapR Technologies | 650-804-8657
> > > >>
> > >
> > >
> >
> >
> > --
> > My research interests are distributed systems, parallel computing and
> > bytecode based virtual machine.
> >
> > My profile:
> > http://www.linkedin.com/in/coderplay
> > My blog:
> > http://coderplay.javaeye.com
> >
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message