drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth J <buckeye.prasa...@gmail.com>
Subject Re: Thrift?
Date Sat, 15 Sep 2012 20:17:36 GMT
Hello everyone

I saw this nice video long back and would like to share with everyone. eBay presented comparison
of various serialization techniques, comparing their performance for different payloads, serialized
size etc.. 

Presentation:
http://qconsf.com/dl/qcon-sanfran-2011/slides/SastryMalladi_DealingWithPerformanceChallengesOptimizedSerializationTechniques.pdf
Video:
http://www.infoq.com/presentations/Dealing-with-Performance-Challenges-Optimized-Data-Formats

Protobuf performs well in all criteria esp. under high payload size with better size reduction
which are critical for drill. 

Thanks
-- Prasanth

On Sep 15, 2012, at 9:42 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Yarn is also strictly client/server which leads to all kinds of problems in
> Hadoop.
> 
> YarnRPC -1
> 
> On Sat, Sep 15, 2012 at 5:26 AM, Min Zhou <coderplay@gmail.com> wrote:
> 
>> YarnRPC -1
>> 
>> That's quite inefficient in my experience and doesn't support
>> multi-languages
>> currently.
>> 
>> On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <hyunsik.choi@gmail.com
>>> wrote:
>> 
>>> +1 for json as initial data format
>>> 
>>> In addition, I recommend YarnRPC with protocol buffer for internal RPC
>> and
>>> API RPC. Protocol buffer is portable to other languages. If we use
>> another
>>> RPC system, we have to additionally consider the security aspect of
>> Hadoop.
>>> 
>>> --
>>> Hyunsik Choi
>>> 
>>> On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <coderplay@gmail.com> wrote:
>>> 
>>>> There should be 2 types of serialization method. One should define its
>>>> schema,
>>>> for the use of RPC,  user wire API; while the other need not define
>>> schema,
>>>> it
>>>> typically for internal data transfer, I think fastjson or kryo is quite
>>>> suitable for the
>>>> latter purpose.
>>>> 
>>>> 
>>>> Regards,
>>>> Min
>>>> 
>>>> On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com> wrote:
>>>> 
>>>>> 
>>>>> Point taken … +1 for protobuf - from my POV we can close ISSUE-1
>>>>> 
>>>>>> The question of an internal wire format, btw, does not constrain
>> the
>>>>> project relative to external access.
>>>>> 
>>>>> Sounds sensible.
>>>>> 
>>>>> The only one thing I really don't get is: why did you put Avro and
>> JSON
>>>>> into the proposal [1] in the first place? Or is this the 'external
>>>> access'
>>>>> from above?
>>>>> 
>>>>> Cheers,
>>>>>           Michael
>>>>> 
>>>>> [1] http://wiki.apache.org/incubator/DrillProposal
>>>>> 
>>>>> --
>>>>> Michael Hausenblas
>>>>> Ireland, Europe
>>>>> http://mhausenblas.info/
>>>>> 
>>>>> On 14 Sep 2012, at 22:31, Ted Dunning wrote:
>>>>> 
>>>>>> I think that it is important to ask a few questions leading up a
>>>> decision
>>>>>> here.
>>>>>> 
>>>>>> The first is a (rhetorical) show of hands about how many people
>>> believe
>>>>>> that there are no serious performance or expressivity killers when
>>>>>> comparing alternative serialization frameworks.  As far as I know,
>>>>>> performance differences are not massive (and protobufs is one of
>> the
>>>>>> leaders in any case) and the expressivity differences are
>> essentially
>>>>> nil.
>>>>>> If somebody feels that there is a serious show-stopper with any
>>> option,
>>>>>> they should speak.
>>>>>> 
>>>>>> The second is to ask the sense of the community whether they judge
>>>>> progress
>>>>>> or perfection in this decision is most important to the project.
>> My
>>>>> guess
>>>>>> is that almost everybody would prefer to see progress as long as
>> the
>>>>>> technical choice is not subject to some horrid missing bit.
>>>>>> 
>>>>>> The final question is whether it is reasonable to go along with
>>>> protobufs
>>>>>> given that several very experienced engineers prefer it and would
>>> like
>>>> to
>>>>>> produce code based on it.  If the first two answers are answered
to
>>> the
>>>>>> effect of protobufs is about as good as we will find and that
>>> progress
>>>>>> trumps small differences, then it seems that moving to follow this
>>>>>> preference of Jason and Ryan for protobufs might be a reasonable
>>> thing
>>>> to
>>>>>> do.
>>>>>> 
>>>>>> The question of an internal wire format, btw, does not constrain
>> the
>>>>>> project relative to external access.  I think it is important to
>>>> support
>>>>>> JDBC and ODBC and whatever is in common use for querying.  For
>>> external
>>>>>> access the question is quite different.  Whereas for the internal
>>>> format
>>>>>> consensus around a single choice has large benefits, the external
>>>> format
>>>>>> choice is nearly the opposite.  For an external format, limiting
>>>>> ourselves
>>>>>> to a single choice seems like a bad idea and increasing the
>> audience
>>>>> seems
>>>>>> like a better choice.
>>>>>> 
>>>>>> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <ryanobjc@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> I just commented on this first JIRA.  Here is my text:
>>>>>>> 
>>>>>>> This issue has been hashed over a lot in the Hadoop projects.
>> There
>>>>>>> was work done to compare thrift vs avro vs protobuf. The
>> conclusion
>>>>>>> was protobuf was the decision to use.
>>>>>>> 
>>>>>>> Prior to this move, there had been a lot of noise about pluggable
>>> RPC
>>>>>>> transports, and whatnot. It held up adoption of a backwards
>>> compatible
>>>>>>> serialization framework for a long time. The problem ended up
>> being
>>>>>>> the analysis-paralysis, rather than the specific implementation
>>>>>>> problem. In other words, the problem was a LACK of implementation
>>> than
>>>>>>> actual REAL problems.
>>>>>>> 
>>>>>>> Based on this experience, I'd strongly suggest adopting protobuf
>> and
>>>>>>> moving on. Forget about pluggable RPC implementations, the
>>> complexity
>>>>>>> doesnt deliver benefits. The benefits of protobuf is that its
the
>>> RPC
>>>>>>> format for Hadoop and HBase, which allows Drill to draw on the
>> broad
>>>>>>> experience of those communities who need to implement high
>>> performance
>>>>>>> backwards compatible RPC serialization.
>>>>>>> 
>>>>>>> ====
>>>>>>> 
>>>>>>> Expanding a bit, I've looked in to this issue a lot, and there
is
>>> very
>>>>>>> few significant concrete reasons to choose protobuf vs thrift.
>> Tiny
>>>>>>> percent faster of this, and that, etc.  I'd strongly suggest
>>> protobuf
>>>>>>> for the expanded community.  There is no particular Apache
>>> imperative
>>>>>>> that Apache projects re-use libraries.  Use what makes sense
for
>>> your
>>>>>>> project.
>>>>>>> 
>>>>>>> As regards to Avro, it's a fine serialization format for long
term
>>>>>>> data retention, but the complexities that exist to enable that
>> make
>>> it
>>>>>>> non-ideal for an RPC.  I know of no one who uses AvroRPC in any
>>> form.
>>>>>>> 
>>>>>>> -ryan
>>>>>>> 
>>>>>>> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <
>> tshiran@maprtech.com
>>>> 
>>>>>>> wrote:
>>>>>>>> We plan to propose the architecture and interfaces in the
next
>>> couple
>>>>>>>> weeks, which will make it easy to divide the project into
clear
>>>>> building
>>>>>>>> blocks. At that point it will be easier to start contributing
>>>> different
>>>>>>>> data sources, data formats, operators, query languages, etc.
>>>>>>>> 
>>>>>>>> The contributions are done in the usual Apache way. It's
best to
>>>> open a
>>>>>>>> JIRA and then post a patch so that others can review and
then a
>>>>> committer
>>>>>>>> can check it in.
>>>>>>>> 
>>>>>>>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia <
>>>>>>> chandanmadhesia@gmail.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> 
>>>>>>>>> What is the process to become a contributor to drill
?
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> chandan
>>>>>>>>> 
>>>>>>>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <
>>> ted.dunning@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Suffice it to say that if *you* think it is important
enough to
>>>>>>> implement
>>>>>>>>>> and maintain, then the group shouldn't say naye.
 The consensus
>>>> stuff
>>>>>>>>>> should only block things that break something else.
 Additive
>>>>> features
>>>>>>>>> that
>>>>>>>>>> are highly maintainable (or which come with commitments)
>>> shouldn't
>>>>>>>>>> generally be blocked.
>>>>>>>>>> 
>>>>>>>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas
<
>>>>>>>>>> michael.hausenblas@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Good. Feel free to put me down for that, if the
group as a
>> whole
>>>>>>> thinks
>>>>>>>>>>> that (supporting Thrift) makes sense.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Tomer Shiran
>>>>>>>> Director of Product Management | MapR Technologies |
>> 650-804-8657
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> My research interests are distributed systems, parallel computing and
>>>> bytecode based virtual machine.
>>>> 
>>>> My profile:
>>>> http://www.linkedin.com/in/coderplay
>>>> My blog:
>>>> http://coderplay.javaeye.com
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> My research interests are distributed systems, parallel computing and
>> bytecode based virtual machine.
>> 
>> My profile:
>> http://www.linkedin.com/in/coderplay
>> My blog:
>> http://coderplay.javaeye.com
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message