crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-636) Make replication factor for temporary files configurable
Date Wed, 22 Feb 2017 06:14:44 GMT


Josh Wills commented on CRUNCH-636:

Yeah, I'm not wild about this one to be honest. I see the appeal for certain use cases, but
we also have ways of configuring custom per-output settings via the conf/outputConf methods
on the Target API, and we should always let folks have enough control over how things are
configured on a per-job basis to be able to do what they want. Like, I think the Crunch philosophy
should be that anything is possible (i.e., there's nothing you can do in vanilla MR that isn't
possible in Crunch), but that sane/stable defaults are also good, so let's not make it all
that easy to do something that is going to yield a bad/unreliable user experience.

> Make replication factor for temporary files configurable
> --------------------------------------------------------
>                 Key: CRUNCH-636
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Attila Sasvari
>            Assignee: Attila Sasvari
> As of now, Crunch does not allow having different replication factor for temporary files
and non-temporary files (e.g. final output data of leaf nodes) at the same time. If a user
has a large amount of data (say hundreds a of gigabytes) to process, they might want to have
lower replication factor for large temporary files between Crunch jobs. 
> We could make this configurable via a new setting (e.g. {{crunch.tmp.dir.replication}}).

This message was sent by Atlassian JIRA

View raw message