metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Fowler <ottobackwa...@gmail.com>
Subject Re: [DISCUSS] Build Times are getting out of hand
Date Tue, 07 Feb 2017 20:12:38 GMT
This PR gets a star just for the commit messages, it isn’t even Friday Casey


On February 7, 2017 at 14:49:22, Casey Stella (cestella@gmail.com) wrote:

I spent a minute or two looking at how we might use travis
configuration-alone to drop the wall-clock time of the build and put it up
for review at https://github.com/apache/incubator-metron/pull/444

It does 2 things:

- Separates the build, the unit tests and the integration tests
- Parallelizes the unit tests and the build and runs the integration
tests within the travis container
- Runs the unit tests and integration tests in separate travis
containers using travis' build matrix

This ultimately cuts the wallclock time down to 24 minutes for me on travis
and should give us some time where we're not constantly bouncing builds to
act on the suggestions here.


On Tue, Feb 7, 2017 at 1:03 PM, Michael Miklavcic <
michael.miklavcic@gmail.com> wrote:

> FYI, found this for Docker - https://docs.travis-ci.com/user/docker/
>
> On Tue, Feb 7, 2017 at 9:09 AM, David Lyle <dlyle65535@gmail.com> wrote:
>
> > Absolutely agree. I also think we'd want both once we've done that.
> Travis
> > is good for smoke testing PRs and Commits. Jenkins is good for nightly
> runs
> > of medium duration tests and would be great for automating our
> distributed
> > testing if we found infrastructure to support it. I've seen them used
in
> > concert to provide a good solution.
> >
> > But, initially, I'd like to see us get our in-process stuff replaced
with
> > docker where (if) it makes sense, refactored to run in parallel, the
poms
> > refactored to handle our dependencies better and our uber jars removed
> > where they can be and minimized where they cannot be.
> >
> > Which, I think, is a long-winded way of saying "I'd like to see us do
> what
> > Casey suggested." :)
> >
> > -D...
> >
> >
> > On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> > michael.miklavcic@gmail.com> wrote:
> >
> > > I agree with this. I don't think we should switch to an alternate
> system
> > > until we find that we are absolutely incapable of eking out any
further
> > > efficiency from the current setup.
> > >
> > > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella <cestella@gmail.com>
> wrote:
> > >
> > > > I believe that some people use travis and some people request
Jenkins
> > > from
> > > > Apache Infra. That being said, personally, I think we should take
> the
> > > > opportunity to correct the underlying issues. 50 minutes for a
build
> > > seems
> > > > excessive to me.
> > > >
> > > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <
> ottobackwards@gmail.com>
> > > > wrote:
> > > >
> > > > > Is there an alternative to Travis? Do other like sized apache
> > projects
> > > > > have these problems? Do they use travis?
> > > > >
> > > > >
> > > > > On February 6, 2017 at 17:02:37, Casey Stella (cestella@gmail.com)

> > > > wrote:
> > > > >
> > > > > For those with pending/building pull requests, it will come as no
> > > > surprise
> > > > > that our build times are increasing at a pace that is worrisome.
In
> > > fact,
> > > > > we have hit a fundamental limit associated with Travis over the
> > > weekend.
> > > > > We have creeped up into the 40+ minute build territory and travis
> > seems
> > > > to
> > > > > error out at around 49 minutes.
> > > > >
> > > > > Taking the current build (
> > > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> > looking
> > > > at
> > > > > just job times, we're spending about 19 - 20 minutes (1176.53
> > seconds)
> > > in
> > > > > tests out of 44 minutes and 42 seconds to do the build. This
places
> > the
> > > > > unit tests at around 43% of the build time. I say all of this to
> > point
> > > > out
> > > > > that while unit tests are a portion of the build, they are not
even
> > the
> > > > > majority of the build time. We need an approach that addresses
the
> > > whole
> > > > > build performance holistically and we need it soonest.
> > > > >
> > > > > To seed the discussion, I will point to a few things that come to
> > mind
> > > > > that
> > > > > fit into three broad categories:
> > > > >
> > > > > *Tests are Slow*
> > > > >
> > > > >
> > > > > - *Tactical*: We have around 13 tests that take more than 30
> seconds
> > > and
> > > > > make up 14 minutes of the build. Considering what we can do to
> speed
> > > > those
> > > > > tests as a tactical approach may be worth considering
> > > > > - We are spinning up the same services (e.g. kafka, storm) for
> > multiple
> > > > > tests, instead use the docker infrastructure to spin them up once
> and
> > > > then
> > > > > use them throughout the tests.
> > > > >
> > > > >
> > > > > *Tests aren't parallel*
> > > > >
> > > > > Currently we cannot run the build in parallel due to the
> integration
> > > test
> > > > > infrastructure spinning up its own services that bind to the same
> > > ports.
> > > > > If we correct this, we can run the builds in parallel with mvn -T
> > > > >
> > > > > - Correct this by decoupling the infrastructure from the tests
and
> > > > > refactoring the tests to run in parallel.
> > > > > - Make the integration testing infrastructure bind intelligently
to
> > > > > whatever port is available.
> > > > > - Move the integration tests to their own project. This will let
us
> > run
> > > > > the build in parallel since an individual project's test will be
> run
> > > > > serially.
> > > > >
> > > > > *Packaging is Painful*
> > > > >
> > > > > We have a sensitive environment in terms of dependencies. As
such,
> we
> > > are
> > > > > careful to shade and relocate dependencies that we want to
isolate
> > from
> > > > > our
> > > > > transitive dependencies. The consequences of this is that we
spend
> a
> > > lot
> > > > > of time in the build shading and relocating maven module output.
> > > > >
> > > > > - Do the hard work to walk our transitive dependencies and ensure
> > that
> > > > > we are including only one copy of every library by using
exclusions
> > > > > effectively. This will not only bring down build times, it will
> make
> > > sure
> > > > > we know what we're including.
> > > > > - Try to devise a strategy where we only shade once at the end.
> This
> > > > > could look like some combination of
> > > > > - standardizing on the lowest common denominator of a troublesome
> > > > > library
> > > > > - We shade in dependencies so they can use different versions of
> > > > > libraries (e.g. metron-common with a modern version of guava)
than
> > the
> > > > > final jars.
> > > > > - exclusions
> > > > > - externalizing infrastructure out to not necessitate spinning up
> > > > > hadoop components in-process for integration tests (i.e. hbase
> server
> > > > > conflicts with storm in a few dependencies)
> > > > >
> > > > > *Final Thoughts*
> > > > >
> > > > > If I had three to pick, I'd pick
> > > > >
> > > > > - moving off of the in-memory component infrastructure to docker
> > images
> > > > > - fixing the maven poms to exclude correctly
> > > > > - ensuring the resulting tests are parallelizable
> > > > >
> > > > > I will point out that fixing the maven poms to exclude correctly
> > (i.e.
> > > we
> > > > > choose the version of every jar that we depend on transitively)
> ticks
> > > > > multiple boxes, not just making things faster.
> > > > >
> > > > > What are your thoughts? What did I miss? We need a plan and we
need
> > to
> > > > > execute on it soon, otherwise travis is going to keep smacking us
> > hard.
> > > > It
> > > > > may be worth while constructing a tactical plan and then a more
> > > strategic
> > > > > plan that we can work toward. I was heartened at how much some of
> > these
> > > > > suggestions dovetail with the discussion around the future of the
> > > docker
> > > > > infrastructure.
> > > > >
> > > > > Best,
> > > > >
> > > > > Casey
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message