cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johanna Griffin (Jira)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-16785) Improvements to CircleCI CI/CD pipeline
Date Thu, 01 Jul 2021 15:53:00 GMT
Johanna Griffin created CASSANDRA-16785:
-------------------------------------------

             Summary: Improvements to CircleCI CI/CD pipeline
                 Key: CASSANDRA-16785
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16785
             Project: Cassandra
          Issue Type: Improvement
          Components: CI
            Reporter: Johanna Griffin
         Attachments: DataStax Config Review & Combined Dynamic Config Setup Workflows
(2021) (2).pdf

h1. About

DataStax is the company [associated with Apache’s open-source Cassandra|https://stackoverflow.com/questions/24564725/apache-cassandra-vs-datastax-cassandra]
project. 

 

The DataStax team requested a configuration review to do a more modern and general revamp
of their pipeline for the 2021 config prepared with the Setup Workflows feature (setup).

 

GitHub Demo Repo for CircleCI Setup Workflows & Dynamic Configuration scripts [here|https://github.com/CircleCI-Public/blog-dynamic-config-examples].

 
h1. Current Configuration & Workflow

 

The current state of their configuration can be found at the [Cassandra open-source GitHub
repository|https://github.com/datastax/cassandra/tree/ds-trunk/.circleci].

 

 

Because Cassandra works with users contributing to the repo who are not a part of the organization,
they expect these users to be able to run tests on CircleCI. 

 

There are three tiers: Medium, Large and X-Large test tiers that users may qualify to run.
Right now, the process to create a valid CircleCI configuration to run is complex in that
it involves copying the right config to the .circleci/config.yml file based on what you’re
attempting to run. 

 

Now: config.yml.LOWRES is the default config. MIDRES and HIGHRES are custom configs for those
who have access to premium CircleCI resources.













h1. Overview
h2. Goals and Desired Outcome

The team is looking for ways to modernize their workflow. Ideally, there is one config.yml
that will run various workflows based on the parameters provided and contexts allowed to the
team member. Updating the config master should be done via GitHub and opening issues on the
JIRA Board. 

 

Suggestions
 #  Currently the process is manual to trigger steps within the build. We'll investigate the
best ways to provide a more automotive experience to cut down on developer time.

Solution: setup workflows, dynamic configuration/conditional workflows and [restricted contexts|https://circleci.com/docs/2.0/contexts/#running-workflows-with-a-restricted-context]  
 #  Easy User Management: The ability for their team to add users through the UI instead of
needing to submit a support ticket. 

Solution: DevOps Customer Engineering worked with the product team in 2020 to add shared organization
management functionality to the UI. On our roadmap for 2021 is to add users via the UI. See 
 #  Tracking Test Results: Tracking test results across runs to get a historical look at build
times, flaky builds, etc.








h2. Q&A Pause














 # Using the LOWRES config as an example which then can be translated to the other config
tiers, jobs with environment variables that could be stored in [Contexts as environment variables
and restricted or unrestricted based on user team/tier|https://circleci.com/docs/2.0/contexts/#restrict-a-context-to-a-security-group-or-groups]
or passed as a pipeline parameters:

 
 * Java 8 JVM Upgrade Distributed Tests (j8_jvm_upgrade_dtests)
 *     - ANT_HOME: /usr/share/ant
 *     - JAVA11_HOME: /usr/lib/jvm/java-11-openjdk-amd64
 *     - JAVA8_HOME: /usr/lib/jvm/java-8-openjdk-amd64
 *     - LANG: en_US.UTF-8
 *     - KEEP_TEST_DIR: true
 *     - DEFAULT_DIR: /home/cassandra/cassandra-dtest
 *     - PYTHONIOENCODING: utf-8
 *     - PYTHONUNBUFFERED: true
 *     - CASS_DRIVER_NO_EXTENSIONS: true
 *     - CASS_DRIVER_NO_CYTHON: true
 *     - CASSANDRA_SKIP_SYNC: true
 *     - DTEST_REPO: git://github.com/apache/cassandra-dtest.git
 *     - DTEST_BRANCH: trunk
 *     - CCM_MAX_HEAP_SIZE: 1024M
 *     - CCM_HEAP_NEWSIZE: 256M
 *     - REPEATED_UTEST_TARGET: testsome
 *     - REPEATED_UTEST_CLASS: null
 *     - REPEATED_UTEST_METHODS: null
 *     - REPEATED_UTEST_COUNT: 100
 *     - REPEATED_UTEST_STOP_ON_FAILURE: false
 *     - REPEATED_DTEST_NAME: null
 *     - REPEATED_DTEST_VNODES: false
 *     - REPEATED_DTEST_COUNT: 100
 *     - REPEATED_DTEST_STOP_ON_FAILURE: false
 *     - JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
 *     - JDK_HOME: /usr/lib/jvm/java-8-openjdk-amd64


 * Java 8 cqlsh Distributed Tests python 2 with VNodes [j8_cqlsh-dtests-py2-with-vnodes]
 *     - ANT_HOME: /usr/share/ant
 *     - JAVA11_HOME: /usr/lib/jvm/java-11-openjdk-amd64
 *     - JAVA8_HOME: /usr/lib/jvm/java-8-openjdk-amd64
 *     - LANG: en_US.UTF-8
 *     - KEEP_TEST_DIR: true
 *     - DEFAULT_DIR: /home/cassandra/cassandra-dtest
 *     - PYTHONIOENCODING: utf-8
 *     - PYTHONUNBUFFERED: true
 *     - CASS_DRIVER_NO_EXTENSIONS: true
 *     - CASS_DRIVER_NO_CYTHON: true
 *     - CASSANDRA_SKIP_SYNC: true
 *     - DTEST_REPO: git://github.com/apache/cassandra-dtest.git
 *     - DTEST_BRANCH: trunk
 *     - CCM_MAX_HEAP_SIZE: 1024M
 *     - CCM_HEAP_NEWSIZE: 256M
 *     - REPEATED_UTEST_TARGET: testsome
 *     - REPEATED_UTEST_CLASS: null
 *     - REPEATED_UTEST_METHODS: null
 *     - REPEATED_UTEST_COUNT: 100
 *     - REPEATED_UTEST_STOP_ON_FAILURE: false
 *     - REPEATED_DTEST_NAME: null
 *     - REPEATED_DTEST_VNODES: false
 *     - REPEATED_DTEST_COUNT: 100
 *     - REPEATED_DTEST_STOP_ON_FAILURE: false
 *     - JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64
 *     - JDK_HOME: /usr/lib/jvm/java-8-openjdk-amd64


 * Java 11 Unit Tests [j11_unit_tests]
 *     - ANT_HOME: /usr/share/ant
 *     - JAVA11_HOME: /usr/lib/jvm/java-11-openjdk-amd64
 *     - JAVA8_HOME: /usr/lib/jvm/java-8-openjdk-amd64
 *     - LANG: en_US.UTF-8
 *     - KEEP_TEST_DIR: true
 *     - DEFAULT_DIR: /home/cassandra/cassandra-dtest
 *     - PYTHONIOENCODING: utf-8
 *     - PYTHONUNBUFFERED: true
 *     - CASS_DRIVER_NO_EXTENSIONS: true
 *     - CASS_DRIVER_NO_CYTHON: true
 *     - CASSANDRA_SKIP_SYNC: true
 *     - DTEST_REPO: git://github.com/apache/cassandra-dtest.git
 *     - DTEST_BRANCH: trunk
 *     - CCM_MAX_HEAP_SIZE: 1024M
 *     - CCM_HEAP_NEWSIZE: 256M
 *     - REPEATED_UTEST_TARGET: testsome
 *     - REPEATED_UTEST_CLASS: null
 *     - REPEATED_UTEST_METHODS: null
 *     - REPEATED_UTEST_COUNT: 100
 *     - REPEATED_UTEST_STOP_ON_FAILURE: false
 *     - REPEATED_DTEST_NAME: null
 *     - REPEATED_DTEST_VNODES: false
 *     - REPEATED_DTEST_COUNT: 100
 *     - REPEATED_DTEST_STOP_ON_FAILURE: false
 *     - JAVA_HOME: /usr/lib/jvm/java-11-openjdk-amd64
 *     - JDK_HOME: /usr/lib/jvm/java-11-openjdk-amd64
 *     - CASSANDRA_USE_JDK11: true


 * Java 8 cqlsh Distributed Tests python 3.8 NO VNodes [j8_cqlsh-dtests-py38-no-vnodes]
 * [etc]




h2. Image, Resource Class, and Executor Choice

A number of executors and resource sizes are defined, including default options. Passing these
along as part of a parameter allows for extra flexibility.

_We recommend using our API to help get insights. A_ [_blog post here_|https://circleci.com/blog/announcing-new-insights-endpoints-in-circleci-s-api-v2/]
_gives details on how it can be used, and may allow you some suggestions on how to focus your
efforts._

_Reference for resources available can be found_ _here._ __ 
h2. Parallelism

There are several jobs where parallelism is enabled and split by timings used in line with
our best practice recommendations. Consider increasing parallelism as needed.
h2. Caching Strategies

 

There are no [save_cache or restore_cache steps|https://github.com/annapamma/spring-petclinic#caching-dependencies]
in your configuration. We normally suggest using a three-layer approach to restoring cache
if your team decides to implement this feature. Here is an example from our documentation:





- restore_cache:

          keys:

            - gem-cache-v1-\{{ arch }}-\{{ .Branch }}-\{{ checksum "Gemfile.lock"
}}

            - gem-cache-v1-\{{ arch }}-\{{ .Branch }}

            - gem-cache-v1

 

This means that it attempts to load the cache in this order:
 # A cache that matches the architecture, branch, and specific checksum of the .lock file.
 # A cache that matches the architecture and branch, but not the specific checksum of the
.lock file.
 # A cache that simply matches the name defined, being the most generic option.

 

This allows branches to have their own caching options more efficiently and may improve performance
further. In your case, this would require a bit of reworking. For a simpler but still efficient
solution, I would recommend something like the following:

 

     - restore_cache:

          Keys:

 - v3-npm-\{{checksum "yarn.lock"}}

            - v3-npm

 

This allows at least some caching to be present even if the yarn file, as a general example,
has changed.

 

_Adding an option to restore cache from a more “generic” source might improve performance.
Otherwise, everything is handled as per our best practices._

 
h2. Secrets Management

One way to limit access to running certain workflows is to create contexts that are only available
to certain teams. See https://circleci.com/docs/2.0/contexts/#restricting-a-context

 

If a user does not have access to the context used in the workflow, the workflow will not
run.





































h2. Orbs

The Cassandra project lends itself to setting up custom Orbs nicely given the amount of scripting
within the config itself. 

 

You could aim to have every workflow essentially simply call on an appropriate Orb, this is
a very effective way of making use of them. Right now your workflows essentially follow the
same steps with only a few minor variations that can be resolved with parameters.

 

Suggestions for orb creation:
 * Determine which unit tests to run
 * Log environment information
 * Run unit tests

 

Please see if these work for your config:
 * [Shallow Clone Orb|https://circleci.com/developer/orbs/orb/guitarrapc/git-shallow-clone]
consideration if needed
 * [Python Orb|https://circleci.com/developer/orbs/orb/circleci/python] and creating your
own orb if needed for configure virtualenv and python Dependencies

 

_A list of publicly accessible orbs can be found_ [_here_|https://circleci.com/developer/orbs]_.
An introduction to the differences between private and public orbs, as well as instructions
on how to author an orb, can be found_ [_at this link_|https://circleci.com/docs/2.0/orb-intro/#private-orbs-vs-public-orbs]_._
h2. Reusable Config Elements

I highly suggest looking into Reusable Config Elements and Orbs based on the repetitive nature
of the Cassandra project config as it stands today.

More on using reusable configuration elements can be [found here|https://circleci.com/docs/2.0/reusing-config/?section=configuration#authoring-reusable-commands].




h3. Insights 
h4. Internal DCE Dashboard - Alert prototyping







h2. Tracking Test Results via UI 


Please add the step[ store_test_results|https://circleci.com/docs/2.0/language-python/#upload-and-store-test-results]
to your config.yml file


Please reference the Insights homepage on the CircleCI Cloud application once this is completed:

https://app.circleci.com/insights/github/datastax





















Useful Links
 * Project Optimisations: [https://circleci.com/docs/2.0/optimizations/#section=projects] 
 * [Monitor and optimize your CI/CD pipeline with Insights from CircleCI|https://circleci.com/blog/monitor-and-optimize-your-ci-cd-pipeline-with-insights-from-circleci/]
 * [2020 State of DevOps Report: Presented by Puppet and CircleCI|https://circleci.com/resources/state-of-devops-report-2020/]











h1. Summary
h3. *2021 Dynamic Configuration for Cassandra*

Our recommendation is to move into using the `setup` workflow key: [Dynamic config|https://circleci.com/blog/building-cicd-pipelines-using-dynamic-config/]
lets you maintain multiple config.yml files in a single code repository, to selectively identify
and execute your primary config.yml files. This feature offers a wide range of powerful capabilities
to easily specify and execute a variety of dynamic pipeline workloads. Please note: when using
dynamic configuration, a “continue” job from the [continuation|https://circleci.com/developer/orbs/orb/circleci/continuation]
[orb|https://circleci.com/docs/2.0/orb-intro/] must be called at the end of the setup workflow.
h3. *Automated Triggering of Builds*

See V2 API

Can trigger build through a [cron job|https://circleci.com/docs/2.0/workflows/].

 
h3. *Tracking Test Results*

Insights API

DataStax Build Times: 15-20 minutes builds, also one an hour 

Goal was moving it from an hour to 15-20 minutes
h2. Resources 

Orb: [CircleCI Developer Hub - circleci/continuation|https://circleci.com/developer/orbs/orb/circleci/continuation] 

Discuss Post: [Dynamic config via setup workflows now available to all users|https://discuss.circleci.com/t/dynamic-config-via-setup-workflows-now-available-to-all-users/39909] 

Documentation: [Pipeline Variables|https://circleci.com/docs/2.0/pipeline-variables/#conditional-workflows] 

 
h2. Q&A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message