crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clément MATHIEU (JIRA) <j...@apache.org>
Subject [jira] [Created] (CRUNCH-637) crunch.bytes.per.reduce.task cannot be used with GroupingOptions
Date Fri, 17 Feb 2017 12:09:41 GMT
Clément MATHIEU created CRUNCH-637:
--------------------------------------

             Summary: crunch.bytes.per.reduce.task cannot be used with GroupingOptions
                 Key: CRUNCH-637
                 URL: https://issues.apache.org/jira/browse/CRUNCH-637
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.14.0
            Reporter: Clément MATHIEU
            Assignee: Josh Wills


I had expect to be able to use {{crunch.bytes.per.reduce.task}} in {{GroupingOptions}} to
fine tune job parallelism. 
           
{code:java}
     .groupByKey(
                        GroupingOptions.builder()
                                .conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))
                                .partitionerClass(RoundRobinPartitioner.class)
                                .build())
{code}

However, {{PGroupedTableImpl}} does not care about {{GroupingOptions.extraConf}} and gets
{{crunch.bytes.per.reduce.task}} from pipeline configuration.

{code:java}
public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements
MRCollection {

    public void configureShuffle(Job job) {
        this.ptype.configureShuffle(job, this.groupingOptions);
        if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0)
{
            int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration());
            if(numReduceTasks > 0) {
                // [...] 
{code}

Is there any reason to not give {{GroupingOptions.extraConf}} a chance ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message