drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyunsik.c...@gmail.com>
Subject Re: Thrift?
Date Sun, 16 Sep 2012 04:50:28 GMT
Min,

Thank you for comments.

On Sun, Sep 16, 2012 at 12:59 PM, Min Zhou <coderplay@gmail.com> wrote:

> Hi, Hyunsik,
>
> Hadoop was born before the java nio became popular. Initially, Hadoop's IPC
> was written in the way of block io. Although, the recent version of hadoop
> and
> Yarn has change its IPC implementation use nio, but for historical reason,
> its
> not a typical way how to use java nio. We did a benchmark on YarnRPC, the
> throughput is no more than 50,000 ops, to be worse, to earn a good result,
> you should increase the number of RPC handlers.
> We've developed another RPC following the best practice how to use java
> nio.
> Under the power of mina2/netty/grizzly, with a good io thread-model,  a
> good memory management, carefully avoid memory copies, and system
> context switches,  we made the throughput up to 168,000 ops.
> see http://code.google.com/p/nfs-rpc/
>
> Thanks,
> Min
>
> On Sun, Sep 16, 2012 at 11:15 AM, Hyunsik Choi <hyunsik.choi@gmail.com
> >wrote:
>
> > Ted,
> >
> > Thank you for detail description. I agree that.
> > In terms of productivity, I mentioned YarnRPC,
> >
> > --
> > Hyunsik Choi
> >
> > On Sun, Sep 16, 2012 at 6:16 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > YarnRPC is based on the original Hadoop RPC.  The major change is the
> > > change from Jute serialization to protobufs.
> > >
> > > Some of the limitations that I know of are:
> > >
> > > - only has a Java implementation
> > >
> > > - uses lots of synchronization instead of using lockless structures
> where
> > > possible
> > >
> > > - it is only client/server, not peer to peer
> > >
> > > - it doesn't support actor-style messages
> > >
> > > I have to admit that I haven't read the details for some time.
> > >
> > > On Sat, Sep 15, 2012 at 6:59 AM, Hyunsik Choi <hyunsik.choi@gmail.com
> > > >wrote:
> > >
> > > > You are right. I missed ProcobufRpcEngine transforms the proto
> message
> > > into
> > > > a RpcResponseWritable object. Currently, It is not portable to other
> > > > languages.
> > > >
> > > > However, why do you think YarnRPC is inefficient? It is just curious
> =)
> > > >
> > > > --
> > > > Hyunsik Choi
> > > >
> > > > On Sat, Sep 15, 2012 at 9:26 PM, Min Zhou <coderplay@gmail.com>
> wrote:
> > > >
> > > > > YarnRPC -1
> > > > >
> > > > > That's quite inefficient in my experience and doesn't support
> > > > > multi-languages
> > > > > currently.
> > > > >
> > > > > On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <
> > hyunsik.choi@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > +1 for json as initial data format
> > > > > >
> > > > > > In addition, I recommend YarnRPC with protocol buffer for
> internal
> > > RPC
> > > > > and
> > > > > > API RPC. Protocol buffer is portable to other languages. If
we
> use
> > > > > another
> > > > > > RPC system, we have to additionally consider the security aspect
> of
> > > > > Hadoop.
> > > > > >
> > > > > > --
> > > > > > Hyunsik Choi
> > > > > >
> > > > > > On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <coderplay@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > There should be 2 types of serialization method. One should
> > define
> > > > its
> > > > > > > schema,
> > > > > > > for the use of RPC,  user wire API; while the other need
not
> > define
> > > > > > schema,
> > > > > > > it
> > > > > > > typically for internal data transfer, I think fastjson
or kryo
> is
> > > > quite
> > > > > > > suitable for the
> > > > > > > latter purpose.
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Min
> > > > > > >
> > > > > > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
> > > > > > > michael.hausenblas@gmail.com> wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Point taken … +1 for protobuf - from my POV we can
close
> > ISSUE-1
> > > > > > > >
> > > > > > > > > The question of an internal wire format, btw,
does not
> > > constrain
> > > > > the
> > > > > > > > project relative to external access.
> > > > > > > >
> > > > > > > > Sounds sensible.
> > > > > > > >
> > > > > > > > The only one thing I really don't get is: why did
you put
> Avro
> > > and
> > > > > JSON
> > > > > > > > into the proposal [1] in the first place? Or is this
the
> > > 'external
> > > > > > > access'
> > > > > > > > from above?
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > >            Michael
> > > > > > > >
> > > > > > > > [1] http://wiki.apache.org/incubator/DrillProposal
> > > > > > > >
> > > > > > > > --
> > > > > > > > Michael Hausenblas
> > > > > > > > Ireland, Europe
> > > > > > > > http://mhausenblas.info/
> > > > > > > >
> > > > > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote:
> > > > > > > >
> > > > > > > > > I think that it is important to ask a few questions
leading
> > up
> > > a
> > > > > > > decision
> > > > > > > > > here.
> > > > > > > > >
> > > > > > > > > The first is a (rhetorical) show of hands about
how many
> > people
> > > > > > believe
> > > > > > > > > that there are no serious performance or expressivity
> killers
> > > > when
> > > > > > > > > comparing alternative serialization frameworks.
 As far as
> I
> > > > know,
> > > > > > > > > performance differences are not massive (and
protobufs is
> one
> > > of
> > > > > the
> > > > > > > > > leaders in any case) and the expressivity differences
are
> > > > > essentially
> > > > > > > > nil.
> > > > > > > > > If somebody feels that there is a serious show-stopper
with
> > any
> > > > > > option,
> > > > > > > > > they should speak.
> > > > > > > > >
> > > > > > > > > The second is to ask the sense of the community
whether
> they
> > > > judge
> > > > > > > > progress
> > > > > > > > > or perfection in this decision is most important
to the
> > > project.
> > > > >  My
> > > > > > > > guess
> > > > > > > > > is that almost everybody would prefer to see
progress as
> long
> > > as
> > > > > the
> > > > > > > > > technical choice is not subject to some horrid
missing bit.
> > > > > > > > >
> > > > > > > > > The final question is whether it is reasonable
to go along
> > with
> > > > > > > protobufs
> > > > > > > > > given that several very experienced engineers
prefer it and
> > > would
> > > > > > like
> > > > > > > to
> > > > > > > > > produce code based on it.  If the first two answers
are
> > > answered
> > > > to
> > > > > > the
> > > > > > > > > effect of protobufs is about as good as we will
find and
> that
> > > > > > progress
> > > > > > > > > trumps small differences, then it seems that
moving to
> follow
> > > > this
> > > > > > > > > preference of Jason and Ryan for protobufs might
be a
> > > reasonable
> > > > > > thing
> > > > > > > to
> > > > > > > > > do.
> > > > > > > > >
> > > > > > > > > The question of an internal wire format, btw,
does not
> > > constrain
> > > > > the
> > > > > > > > > project relative to external access.  I think
it is
> important
> > > to
> > > > > > > support
> > > > > > > > > JDBC and ODBC and whatever is in common use for
querying.
> >  For
> > > > > > external
> > > > > > > > > access the question is quite different.  Whereas
for the
> > > internal
> > > > > > > format
> > > > > > > > > consensus around a single choice has large benefits,
the
> > > external
> > > > > > > format
> > > > > > > > > choice is nearly the opposite.  For an external
format,
> > > limiting
> > > > > > > > ourselves
> > > > > > > > > to a single choice seems like a bad idea and
increasing the
> > > > > audience
> > > > > > > > seems
> > > > > > > > > like a better choice.
> > > > > > > > >
> > > > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson
<
> > > > ryanobjc@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi folks,
> > > > > > > > >>
> > > > > > > > >> I just commented on this first JIRA.  Here
is my text:
> > > > > > > > >>
> > > > > > > > >> This issue has been hashed over a lot in
the Hadoop
> > projects.
> > > > > There
> > > > > > > > >> was work done to compare thrift vs avro vs
protobuf. The
> > > > > conclusion
> > > > > > > > >> was protobuf was the decision to use.
> > > > > > > > >>
> > > > > > > > >> Prior to this move, there had been a lot
of noise about
> > > > pluggable
> > > > > > RPC
> > > > > > > > >> transports, and whatnot. It held up adoption
of a
> backwards
> > > > > > compatible
> > > > > > > > >> serialization framework for a long time.
The problem ended
> > up
> > > > > being
> > > > > > > > >> the analysis-paralysis, rather than the specific
> > > implementation
> > > > > > > > >> problem. In other words, the problem was
a LACK of
> > > > implementation
> > > > > > than
> > > > > > > > >> actual REAL problems.
> > > > > > > > >>
> > > > > > > > >> Based on this experience, I'd strongly suggest
adopting
> > > protobuf
> > > > > and
> > > > > > > > >> moving on. Forget about pluggable RPC implementations,
the
> > > > > > complexity
> > > > > > > > >> doesnt deliver benefits. The benefits of
protobuf is that
> > its
> > > > the
> > > > > > RPC
> > > > > > > > >> format for Hadoop and HBase, which allows
Drill to draw on
> > the
> > > > > broad
> > > > > > > > >> experience of those communities who need
to implement high
> > > > > > performance
> > > > > > > > >> backwards compatible RPC serialization.
> > > > > > > > >>
> > > > > > > > >> ====
> > > > > > > > >>
> > > > > > > > >> Expanding a bit, I've looked in to this issue
a lot, and
> > there
> > > > is
> > > > > > very
> > > > > > > > >> few significant concrete reasons to choose
protobuf vs
> > thrift.
> > > > >  Tiny
> > > > > > > > >> percent faster of this, and that, etc.  I'd
strongly
> suggest
> > > > > > protobuf
> > > > > > > > >> for the expanded community.  There is no
particular Apache
> > > > > > imperative
> > > > > > > > >> that Apache projects re-use libraries.  Use
what makes
> sense
> > > for
> > > > > > your
> > > > > > > > >> project.
> > > > > > > > >>
> > > > > > > > >> As regards to Avro, it's a fine serialization
format for
> > long
> > > > term
> > > > > > > > >> data retention, but the complexities that
exist to enable
> > that
> > > > > make
> > > > > > it
> > > > > > > > >> non-ideal for an RPC.  I know of no one who
uses AvroRPC
> in
> > > any
> > > > > > form.
> > > > > > > > >>
> > > > > > > > >> -ryan
> > > > > > > > >>
> > > > > > > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran
<
> > > > > tshiran@maprtech.com
> > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>> We plan to propose the architecture and
interfaces in the
> > > next
> > > > > > couple
> > > > > > > > >>> weeks, which will make it easy to divide
the project into
> > > clear
> > > > > > > > building
> > > > > > > > >>> blocks. At that point it will be easier
to start
> > contributing
> > > > > > > different
> > > > > > > > >>> data sources, data formats, operators,
query languages,
> > etc.
> > > > > > > > >>>
> > > > > > > > >>> The contributions are done in the usual
Apache way. It's
> > best
> > > > to
> > > > > > > open a
> > > > > > > > >>> JIRA and then post a patch so that others
can review and
> > > then a
> > > > > > > > committer
> > > > > > > > >>> can check it in.
> > > > > > > > >>>
> > > > > > > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan
Madhesia <
> > > > > > > > >> chandanmadhesia@gmail.com
> > > > > > > > >>>> wrote:
> > > > > > > > >>>
> > > > > > > > >>>> Hi
> > > > > > > > >>>>
> > > > > > > > >>>> Hi
> > > > > > > > >>>>
> > > > > > > > >>>> What is the process to become a contributor
to drill ?
> > > > > > > > >>>>
> > > > > > > > >>>> Regards
> > > > > > > > >>>> chandan
> > > > > > > > >>>>
> > > > > > > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted
Dunning <
> > > > > > ted.dunning@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>>>
> > > > > > > > >>>>> Suffice it to say that if *you*
think it is important
> > > enough
> > > > to
> > > > > > > > >> implement
> > > > > > > > >>>>> and maintain, then the group
shouldn't say naye.  The
> > > > consensus
> > > > > > > stuff
> > > > > > > > >>>>> should only block things that
break something else.
> > >  Additive
> > > > > > > > features
> > > > > > > > >>>> that
> > > > > > > > >>>>> are highly maintainable (or which
come with
> commitments)
> > > > > > shouldn't
> > > > > > > > >>>>> generally be blocked.
> > > > > > > > >>>>>
> > > > > > > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM,
Michael Hausenblas <
> > > > > > > > >>>>> michael.hausenblas@gmail.com>
wrote:
> > > > > > > > >>>>>
> > > > > > > > >>>>>> Good. Feel free to put me
down for that, if the group
> > as a
> > > > > whole
> > > > > > > > >> thinks
> > > > > > > > >>>>>> that (supporting Thrift)
makes sense.
> > > > > > > > >>>>>>
> > > > > > > > >>>>>
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > > > > >>> --
> > > > > > > > >>> Tomer Shiran
> > > > > > > > >>> Director of Product Management | MapR
Technologies |
> > > > 650-804-8657
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > My research interests are distributed systems, parallel
> computing
> > > and
> > > > > > > bytecode based virtual machine.
> > > > > > >
> > > > > > > My profile:
> > > > > > > http://www.linkedin.com/in/coderplay
> > > > > > > My blog:
> > > > > > > http://coderplay.javaeye.com
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > My research interests are distributed systems, parallel computing
> and
> > > > > bytecode based virtual machine.
> > > > >
> > > > > My profile:
> > > > > http://www.linkedin.com/in/coderplay
> > > > > My blog:
> > > > > http://coderplay.javaeye.com
> > > > >
> > > >
> > >
> >
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message