tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <go...@hortonworks.com>
Subject Re: What is PipelinedSorter for?
Date Sat, 25 Jan 2014 03:57:14 GMT

The alternate sorter is a pet project of mine. It dates from Nov 2012.

When I joined HWX I was told to "Optimize MR" & handed Terasort. I
suspect it could've been a joke at my expense :)

I wrote up a spec for what I thought would be a next gen sort buffer
impl for MR & took 3 weeks to implement it  (MAPREDUCE-4755).

But this was such a huge patch that it would never really make it into
trunk without an insane amount of testing & it never actually went
into PA.

Until Tez came along, when it became possible to plug in a new
MapOutputBuffer and not just a sort class & switch between them.

PipelinedSorter has been sitting in the Tez repo without any way to
enable it for more than a year now (was in the initial import of Tez
into incubator).

At this point, I have forgotten how most of it works - which is why
I'm config+enabling it to test it out.

The con is that it's not tested beyond my 3 node clusters running
terasort & hive's TPC-DS queries.

You can poke about the spec I wrote in 2012 (some details might be
unimplemented) - http://people.apache.org/~gopalv/PipelinedSorter.pdf

The document has most of the arguments for this sorter.


On Fri, Jan 24, 2014 at 3:43 PM, Rohini Palaniswamy
<rohini.aditya@gmail.com> wrote:
> Hi,
>    Looks like PipelinedSorter uses multiple threads to do the sort. Can
> someone explain its use, pros and cons?
> Regards,
> Rohini
> ---------- Forwarded message ----------
> From: Gopal V (JIRA) <jira@apache.org>
> Date: Fri, Jan 24, 2014 at 3:07 PM
> Subject: [jira] [Created] (TEZ-765) Allow tez.runtime.sort.threads > 1 to
> turn on PipelinedSorter
> To: issues@tez.incubator.apache.org
> Gopal V created TEZ-765:
> ---------------------------
>              Summary: Allow tez.runtime.sort.threads > 1 to turn on
> PipelinedSorter
>                  Key: TEZ-765
>                  URL: https://issues.apache.org/jira/browse/TEZ-765
>              Project: Apache Tez
>           Issue Type: Bug
>     Affects Versions: 0.3.0
>             Reporter: Gopal V
>             Assignee: Gopal V
>             Priority: Trivial
>              Fix For: 0.3.0
> The Tez pipelined sorter cannot be turned on without a rebuild.
> Allow the sorter to be turned on via already existing config key
> "tez.runtime.sort.threads".
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message