metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Miklavcic <michael.miklav...@gmail.com>
Subject Re: [DISCUSS] Build Times are getting out of hand
Date Tue, 07 Feb 2017 18:03:41 GMT
FYI, found this for Docker - https://docs.travis-ci.com/user/docker/

On Tue, Feb 7, 2017 at 9:09 AM, David Lyle <dlyle65535@gmail.com> wrote:

> Absolutely agree. I also think we'd want both once we've done that. Travis
> is good for smoke testing PRs and Commits. Jenkins is good for nightly runs
> of medium duration tests and would be great for automating our distributed
> testing if we found infrastructure to support it. I've seen them used in
> concert to provide a good solution.
>
> But, initially, I'd like to see us get our in-process stuff replaced with
> docker where (if) it makes sense, refactored to run in parallel, the poms
> refactored to handle our dependencies better and our uber jars removed
> where they can be and minimized where they cannot be.
>
> Which, I think, is a long-winded way of saying "I'd like to see us do what
> Casey suggested." :)
>
> -D...
>
>
> On Tue, Feb 7, 2017 at 10:45 AM, Michael Miklavcic <
> michael.miklavcic@gmail.com> wrote:
>
> > I agree with this. I don't think we should switch to an alternate system
> > until we find that we are absolutely incapable of eking out any further
> > efficiency from the current setup.
> >
> > On Tue, Feb 7, 2017 at 8:04 AM, Casey Stella <cestella@gmail.com> wrote:
> >
> > > I believe that some people use travis and some people request Jenkins
> > from
> > > Apache Infra.  That being said, personally, I think we should take the
> > > opportunity to correct the underlying issues.  50 minutes for a build
> > seems
> > > excessive to me.
> > >
> > > On Mon, Feb 6, 2017 at 10:07 PM, Otto Fowler <ottobackwards@gmail.com>
> > > wrote:
> > >
> > > > Is there an alternative to Travis?  Do other like sized apache
> projects
> > > > have these problems?  Do they use travis?
> > > >
> > > >
> > > > On February 6, 2017 at 17:02:37, Casey Stella (cestella@gmail.com)
> > > wrote:
> > > >
> > > > For those with pending/building pull requests, it will come as no
> > > surprise
> > > > that our build times are increasing at a pace that is worrisome. In
> > fact,
> > > > we have hit a fundamental limit associated with Travis over the
> > weekend.
> > > > We have creeped up into the 40+ minute build territory and travis
> seems
> > > to
> > > > error out at around 49 minutes.
> > > >
> > > > Taking the current build (
> > > > https://travis-ci.org/apache/incubator-metron/jobs/198929446),
> looking
> > > at
> > > > just job times, we're spending about 19 - 20 minutes (1176.53
> seconds)
> > in
> > > > tests out of 44 minutes and 42 seconds to do the build. This places
> the
> > > > unit tests at around 43% of the build time. I say all of this to
> point
> > > out
> > > > that while unit tests are a portion of the build, they are not even
> the
> > > > majority of the build time. We need an approach that addresses the
> > whole
> > > > build performance holistically and we need it soonest.
> > > >
> > > > To seed the discussion, I will point to a few things that come to
> mind
> > > > that
> > > > fit into three broad categories:
> > > >
> > > > *Tests are Slow*
> > > >
> > > >
> > > > - *Tactical*: We have around 13 tests that take more than 30 seconds
> > and
> > > > make up 14 minutes of the build. Considering what we can do to speed
> > > those
> > > > tests as a tactical approach may be worth considering
> > > > - We are spinning up the same services (e.g. kafka, storm) for
> multiple
> > > > tests, instead use the docker infrastructure to spin them up once and
> > > then
> > > > use them throughout the tests.
> > > >
> > > >
> > > > *Tests aren't parallel*
> > > >
> > > > Currently we cannot run the build in parallel due to the integration
> > test
> > > > infrastructure spinning up its own services that bind to the same
> > ports.
> > > > If we correct this, we can run the builds in parallel with mvn -T
> > > >
> > > > - Correct this by decoupling the infrastructure from the tests and
> > > > refactoring the tests to run in parallel.
> > > > - Make the integration testing infrastructure bind intelligently to
> > > > whatever port is available.
> > > > - Move the integration tests to their own project. This will let us
> run
> > > > the build in parallel since an individual project's test will be run
> > > > serially.
> > > >
> > > > *Packaging is Painful*
> > > >
> > > > We have a sensitive environment in terms of dependencies. As such, we
> > are
> > > > careful to shade and relocate dependencies that we want to isolate
> from
> > > > our
> > > > transitive dependencies. The consequences of this is that we spend a
> > lot
> > > > of time in the build shading and relocating maven module output.
> > > >
> > > > - Do the hard work to walk our transitive dependencies and ensure
> that
> > > > we are including only one copy of every library by using exclusions
> > > > effectively. This will not only bring down build times, it will make
> > sure
> > > > we know what we're including.
> > > > - Try to devise a strategy where we only shade once at the end. This
> > > > could look like some combination of
> > > > - standardizing on the lowest common denominator of a troublesome
> > > > library
> > > > - We shade in dependencies so they can use different versions of
> > > > libraries (e.g. metron-common with a modern version of guava) than
> the
> > > > final jars.
> > > > - exclusions
> > > > - externalizing infrastructure out to not necessitate spinning up
> > > > hadoop components in-process for integration tests (i.e. hbase server
> > > > conflicts with storm in a few dependencies)
> > > >
> > > > *Final Thoughts*
> > > >
> > > > If I had three to pick, I'd pick
> > > >
> > > > - moving off of the in-memory component infrastructure to docker
> images
> > > > - fixing the maven poms to exclude correctly
> > > > - ensuring the resulting tests are parallelizable
> > > >
> > > > I will point out that fixing the maven poms to exclude correctly
> (i.e.
> > we
> > > > choose the version of every jar that we depend on transitively) ticks
> > > > multiple boxes, not just making things faster.
> > > >
> > > > What are your thoughts? What did I miss? We need a plan and we need
> to
> > > > execute on it soon, otherwise travis is going to keep smacking us
> hard.
> > > It
> > > > may be worth while constructing a tactical plan and then a more
> > strategic
> > > > plan that we can work toward. I was heartened at how much some of
> these
> > > > suggestions dovetail with the discussion around the future of the
> > docker
> > > > infrastructure.
> > > >
> > > > Best,
> > > >
> > > > Casey
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message