spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Should spark-ec2 get its own repo?
Date Sun, 12 Jul 2015 08:34:31 GMT
I agree with these points. The ec2 support is substantially a separate
project, and would likely be better managed as one. People can much
more rapidly iterate on it and release it.

I suggest:

1. Pick a new repo location. amplab/spark-ec2 ? spark-ec2/spark-ec2 ?
2. Add interested parties as owners/contributors
3. Reassemble a working clone of the current code from spark/ec2 and
mesos/spark-ec2 and check it in
4. Announce the new location on user@, dev@
5. Triage open JIRAs to the new repo's issue tracker and close them elsewhere
6. Remove the old copies of the code and leave a pointer to the new
location in their place

I'd also like to hear a few more nods before pulling the trigger though.

On Sat, Jul 11, 2015 at 7:07 PM, Matt Goodman <> wrote:
> I wanted to revive the conversation about the spark-ec2 tools, as it seems
> to have been lost in the 1.4.1 release voting spree.
> I think that splitting it into its own repository is a really good move, and
> I would also be happy to help with this transition, as well as help maintain
> the resulting repository.  Here is my justification for why we ought to do
> this split.
> User Facing:
> The spark-ec2 launcher dosen't use anything in the parent spark repository
> spark-ec2 version is disjoint from the parent repo.  I consider it confusing
> that the spark-ec2 script dosen't launch the version of spark it is
> checked-out with.
> Someone interested in setting up spark-ec2 with anything but the default
> configuration will have to clone at least 2 repositories at present, and
> probably fork and push changes to 1.
> spark-ec2 has mismatched dependencies wrt. to spark itself.  This includes a
> confusing shim in the spark-ec2 script to install boto, which frankly should
> just be a dependency of the script
> Developer Facing:
> Support across 2 repos will be worse than across 1.  Its unclear where to
> file issues/PRs, and requires extra communications for even fairly trivial
> stuff.
> Spark-ec2 also depends on a number binary blobs being in the right place,
> currently the responsibility for these is decentralized, and likely prone to
> various flavors of dumb.
> The current flow of booting a spark-ec2 cluster is _complicated_ I spent the
> better part of a couple days figuring out how to integrate our custom tools
> into this stack.  This is very hard to fix when commits/PR's need to span
> groups/repositories/buckets-o-binary, I am sure there are several other
> problems that are languishing under similar roadblocks
> It makes testing possible.  The spark-ec2 script is a great case for CI
> given the number of permutations of launch criteria there are.  I suspect
> AWS would be happy to foot the bill on spark-ec2 testing (probably ~20 bucks
> a month based on some envelope sketches), as it is a piece of software that
> directly impacts other people giving them money.  I have some contacts
> there, and I am pretty sure this would be an easy conversation, particularly
> if the repo directly concerned with ec2.  Think also being able to assemble
> the binary blobs into s3 bucket dedicated to spark-ec2
> Any other thoughts/voices appreciated here.  spark-ec2 is a super-power tool
> and deserves a fair bit of attention!
> --Matthew Goodman
> =====================
> Check Out My Website:
> Find me on LinkedIn:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message