crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clément MATHIEU (JIRA) <>
Subject [jira] [Created] (CRUNCH-637) crunch.bytes.per.reduce.task cannot be used with GroupingOptions
Date Fri, 17 Feb 2017 12:09:41 GMT
Clément MATHIEU created CRUNCH-637:

             Summary: crunch.bytes.per.reduce.task cannot be used with GroupingOptions
                 Key: CRUNCH-637
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.14.0
            Reporter: Clément MATHIEU
            Assignee: Josh Wills

I had expect to be able to use {{crunch.bytes.per.reduce.task}} in {{GroupingOptions}} to
fine tune job parallelism. 
                                .conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))

However, {{PGroupedTableImpl}} does not care about {{GroupingOptions.extraConf}} and gets
{{crunch.bytes.per.reduce.task}} from pipeline configuration.

public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements
MRCollection {

    public void configureShuffle(Job job) {
        this.ptype.configureShuffle(job, this.groupingOptions);
        if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0)
            int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration());
            if(numReduceTasks > 0) {
                // [...] 

Is there any reason to not give {{GroupingOptions.extraConf}} a chance ?

This message was sent by Atlassian JIRA

View raw message