Clément MATHIEU created CRUNCH-637:
--------------------------------------
Summary: crunch.bytes.per.reduce.task cannot be used with GroupingOptions
Key: CRUNCH-637
URL: https://issues.apache.org/jira/browse/CRUNCH-637
Project: Crunch
Issue Type: Improvement
Components: Core
Affects Versions: 0.14.0
Reporter: Clément MATHIEU
Assignee: Josh Wills
I had expect to be able to use {{crunch.bytes.per.reduce.task}} in {{GroupingOptions}} to
fine tune job parallelism.
{code:java}
.groupByKey(
GroupingOptions.builder()
.conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))
.partitionerClass(RoundRobinPartitioner.class)
.build())
{code}
However, {{PGroupedTableImpl}} does not care about {{GroupingOptions.extraConf}} and gets
{{crunch.bytes.per.reduce.task}} from pipeline configuration.
{code:java}
public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements
MRCollection {
public void configureShuffle(Job job) {
this.ptype.configureShuffle(job, this.groupingOptions);
if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0)
{
int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration());
if(numReduceTasks > 0) {
// [...]
{code}
Is there any reason to not give {{GroupingOptions.extraConf}} a chance ?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
|