tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhiyuan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-3708) Arbitrary parallelism for unpartitioned cartesian product regardless of # src tasks
Date Wed, 03 May 2017 15:54:04 GMT
Zhiyuan Yang created TEZ-3708:
---------------------------------

             Summary: Arbitrary parallelism for unpartitioned cartesian product regardless
of # src tasks
                 Key: TEZ-3708
                 URL: https://issues.apache.org/jira/browse/TEZ-3708
             Project: Apache Tez
          Issue Type: Sub-task
            Reporter: Zhiyuan Yang
            Assignee: Zhiyuan Yang


Current unpartitioned cartesian product has a few limitations
1. parallelism can be not enough in case of large split and small # src task
2. parallelism can be too much in in case of large # src task
3. workload is not ideally distributed across the worker. Even with auto grouping, grouping
by size may not be accurate because same size can means different #record and different cartesian
product ops.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message