tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohini Palaniswamy <rohini.adi...@gmail.com>
Subject Automatic Reducer Parallelism
Date Sun, 16 Mar 2014 19:10:06 GMT
Hi,
   I was looking at configuring ARP for Pig on Tez. My understanding of
what is available currently is:

  ShuffleVertexManager is the one that currently supports auto parallelism.
If TEZ_AM_SHUFFLE_VERTEX_MANAGER_ENABLE_AUTO_PARALLEL is set to true, then
based on TEZ_AM_SHUFFLE_VERTEX_MANAGER_DESIRED_TASK_INPUT_SIZE and
TEZ_AM_SHUFFLE_VERTEX_MANAGER_MIN_TASK_PARALLELISM, parallelism is computed
based on stats from some of the completed map tasks after the slow start
threshold for reducers kick in and reducer tasks are started.


Questions:
    1) Since it is a AM level setting,looks like it is possible to say do
not apply auto parallelism for this vertex. Is that correct?  Pig has a
PARALLEL clause which allows users to set parallelism for a particular
operation like JOIN, GROUP BY or ORDER BY. We would like to honor that and
use automatic parallelism only for operations where user has not defined
PARALLEL.  Also when there is a custom partitioner involved (like range
partitioning in case of order by) we do not want ARP to kick in. Is it
possible to turn on or off ARP per vertex?
    2) How is ARP used in hive?
    3) Any other things we need to know about ARP? Any new optimizations or
changes planned?

Regards,
Rohini

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message