ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject Re: [DISCUSSION] Ignite integration testing framework.
Date Tue, 16 Jun 2020 08:58:33 GMT
Hello, Maxim.

Thank you for so detailed explanation.

Can we put the content of this discussion somewhere on the wiki?
So It doesn’t get lost.

I divide the answer in several parts. From the requirements to the implementation.
So, if we agreed on the requirements we can proceed with the discussion of the implementation.

1. Requirements:

The main goal I want to achieve is *reproducibility* of the tests.
I’m sick and tired with the zillions of flaky, rarely failed, and almost never failed tests
in Ignite codebase.
We should start with the simplest scenarios that will be as reliable as steel :)

I want to know for sure:
  - Is this PR makes rebalance quicker or not?
  - Is this PR makes PME quicker or not?

So, your description of the complex test scenario looks as a next step to me.

Anyway, It’s cool we already have one.

The second goal is to have a strict test lifecycle as we have in JUnit and similar frameworks.


> It covers production-like deployment and running a scenarios over a single database instance.

Do you mean «single cluster» or «single host»?

2. Existing tests:

> A Combinator suite allows to run set of operations concurrently over given database instance.
> A Consumption suite allows to run a set production-like actions over given set of Ignite/GridGain
versions and compare test metrics across versions
> A Yardstick suite
> A Stress suite that simulates hardware environment degradation
> An Ultimate, DR and Compatibility suites that performs functional regression testing
> Regression

Great news that we already have so many choices for testing!
Mature test base is a big +1 for Tiden.

3. Comparison:

> Criteria: Test configuration
> Ducktape: single JSON string for all tests
> Tiden: any number of YaML config files, command line option for fine-grained test configuration,
ability to select/modify tests behavior based on Ignite version.

1. Many YAML files can be hard to maintain.
2. In ducktape, you can set parameters via «—parameters» option. Please, take a look at
the doc [1]

> Criteria: Cluster control
> Tiden: additionally can address cluster as a whole and execute remote commands in parallel.

It seems we implement this ability in the PoC, already.

> Criteria: Test assertions
> Tiden: simple asserts, also few customized assertion helpers.
> Ducktape: simple asserts.

Can you, please, be more specific.
What helpers do you have in mind?
Ducktape has an asserts that waits for logfile messages or some process finish.

> Criteria: Test reporting
> Ducktape: limited to its own text/HTML format

Ducktape have
1. Text reporter
2. Customizable HTML reporter
3. JSON reporter.

We can show JSON with the any template or tool.

> Criteria: Provisioning and deployment
> Ducktape: can provision subset of hosts from cluster for test needs. However, that means,
that test can’t be scaled without test code changes. Does not do any deploy, relies on external
means, e.g. pre-packaged in docker image, as in PoC. 

This is not true.

1. We can set explicit test parameters(node number) via parameters.
We can increase client count of cluster size without test code changes.

2. We have many choices for the test environment. These choices are tested and used in other
projects:
	* docker
	* vagrant
	* private cloud(ssh access)
	* ec2
Please, take a look at Kafka documentation [2]

> I can continue more on this, but it should be enough for now:

We need to go deeper! :)

[1]  https://ducktape-docs.readthedocs.io/en/latest/run_tests.html#options
[2] https://github.com/apache/kafka/tree/trunk/tests#ec2-quickstart

> 9 июня 2020 г., в 17:25, Max A. Shonichev <mshonich@yandex.ru> написал(а):
> 
> Greetings, Nikolay,
> 
> First of all, thank you for you great effort preparing PoC of integration testing to
Ignite community.
> 
> It’s a shame Ignite did not have at least some such tests yet, however, GridGain, as
a major contributor to Apache Ignite had a profound collection of in-house tools to perform
integration and performance testing for years already and while we slowly consider sharing
our expertise with the community, your initiative makes us drive that process a bit faster,
thanks a lot!
> 
> I reviewed your PoC and want to share a little about what we do on our part, why and
how, hope it would help community take proper course.
> 
> First I’ll do a brief overview of what decisions we made and what we do have in our
private code base, next I’ll describe what we have already donated to the public and what
we plan public next, then I’ll compare both approaches highlighting deficiencies in order
to spur public discussion on the matter.
> 
> It might seem strange to use Python to run Bash to run Java applications because that
introduces IT industry best of breed’ – the Python dependency hell – to the Java application
code base. The only strangest decision one can made is to use Maven to run Docker to run Bash
to run Python to run Bash to run Java, but desperate times call for desperate measures I guess.
> 
> There are Java-based solutions for integration testing exists, e.g. Testcontainers [1],
Arquillian [2], etc, and they might go well for Ignite community CI pipelines by them selves.
But we also wanted to run performance tests and benchmarks, like the dreaded PME benchmark,
and this is solved by totally different set of tools in Java world, e.g. Jmeter [3], OpenJMH
[4], Gatling [5], etc.
> 
> Speaking specifically about benchmarking, Apache Ignite community already has Yardstick
[6], and there’s nothing wrong with writing PME benchmark using Yardstick, but we also wanted
to be able to run scenarios like this:
> - put an X load to a Ignite database;
> - perform an Y set of operations to check how Ignite copes with operations under load.
> 
> And yes, we also wanted applications under test be deployed ‘like in a production’,
e.g. distributed over a set of hosts. This arises questions about provisioning and nodes affinity
which I’ll cover in detail later.
> 
> So we decided to put a little effort to build a simple tool to cover different integration
and performance scenarios, and our QA lab first attempt was PoC-Tester [7], currently open
source for all but for reporting web UI. It’s a quite simple to use 95% Java-based tool
targeted to be run on a pre-release QA stage.
> 
> It covers production-like deployment and running a scenarios over a single database instance.
PoC-Tester scenarios consists of a sequence of tasks running sequentially or in parallel.
After all tasks complete, or at any time during test, user can run logs collection task, logs
are checked against exceptions and a summary of found issues and task ops/latency statistics
is generated at the end of scenario. One of the main PoC-Tester features is its fire-and-forget
approach to task managing. That is, you can deploy a grid and left it running for weeks, periodically
firing some tasks onto it.
> 
> During earliest stages of PoC-Tester development it becomes quite clear that Java application
development is a tedious process and architecture decisions you take during development are
slow and hard to change.
> For example, scenarios like this
> - deploy two instances of GridGain with master-slave data replication configured;
> - put a load on master;
> - perform checks on slave,
> or like this:
> - preload a 1Tb of data by using your favorite tool of choice to an Apache Ignite of
version X;
> - run a set of functional tests running Apache Ignite version Y over preloaded data,
> do not fit well in the PoC-Tester workflow.
> 
> So, this is why we decided to use Python as a generic scripting language of choice.
> 
> Pros:
> - quicker prototyping and development cycles
> - easier to find DevOps/QA engineer with Python skills than one with Java skills
> - used extensively all over the world for DevOps/CI pipelines and thus has rich set of
libraries for all possible integration uses cases.
> 
> Cons:
> - Nightmare with dependencies. Better stick to specific language/libraries version.
> 
> Comparing alternatives for Python-based testing framework we have considered following
requirements, somewhat similar to what you’ve mentioned for Confluent [8] previously:
> - should be able run locally or distributed (bare metal or in the cloud)
> - should have built-in deployment facilities for applications under test
> - should separate test configuration and test code
> -- be able to easily reconfigure tests by simple configuration changes
> -- be able to easily scale test environment by simple configuration changes
> -- be able to perform regression testing by simple switching artifacts under test via
configuration
> -- be able to run tests with different JDK version by simple configuration changes
> - should have human readable reports and/or reporting tools integration
> - should allow simple test progress monitoring, one does not want to run 6-hours test
to find out that application actually crashed during first hour.
> - should allow parallel execution of test actions
> - should have clean API for test writers
> -- clean API for distributed remote commands execution
> -- clean API for deployed applications start / stop and other operations
> -- clean API for performing check on results
> - should be open source or at least source code should allow ease change or extension
> 
> Back at that time we found no better alternative than to write our own framework, and
here goes Tiden [9] as GridGain framework of choice for functional integration and performance
testing.
> 
> Pros:
> - solves all the requirements above
> Cons (for Ignite):
> - (currently) closed GridGain source
> 
> On top of Tiden we’ve built a set of test suites, some of which you might have heard
already.
> 
> A Combinator suite allows to run set of operations concurrently over given database instance.
Proven to find at least 30+ race conditions and NPE issues.
> 
> A Consumption suite allows to run a set production-like actions over given set of Ignite/GridGain
versions and compare test metrics across versions, like heap/disk/CPU consumption, time to
perform actions, like client PME, server PME, rebalancing time, data replication time, etc.
> 
> A Yardstick suite is a thin layer of Python glue code to run Apache Ignite pre-release
benchmarks set. Yardstick itself has a mediocre deployment capabilities, Tiden solves this
easily.
> 
> A Stress suite that simulates hardware environment degradation during testing.
> 
> An Ultimate, DR and Compatibility suites that performs functional regression testing
of GridGain Ultimate Edition features like snapshots, security, data replication, rolling
upgrades, etc.
> 
> A Regression and some IEPs testing suites, like IEP-14, IEP-15, etc, etc, etc.
> 
> Most of the suites above use another in-house developed Java tool – PiClient – to
perform actual loading and miscellaneous operations with Ignite under test. We use py4j Python-Java
gateway library to control PiClient instances from the tests.
> 
> When we considered CI, we put TeamCity out of scope, because distributed integration
and performance tests tend to run for hours and TeamCity agents are scarce and costly resource.
So, bundled with Tiden there is jenkins-job-builder [10] based CI pipelines and Jenkins xUnit
reporting. Also, rich web UI tool Ward aggregates test run reports across versions and has
built in visualization support for Combinator suite.
> 
> All of the above is currently closed source, but we plan to make it public for community,
and publishing Tiden core [9] is the first step on that way. You can review some examples
of using Tiden for tests at my repository [11], for start.
> 
> Now, let’s compare Ducktape PoC and Tiden.
> 
> Criteria: Language
> Tiden: Python, 3.7
> Ducktape: Python, proposes itself as Python 2.7, 3.6, 3.7 compatible, but actually can’t
work with Python 3.7 due to broken Zmq dependency.
> Comment: Python 3.7 has a much better support for async-style code which might be crucial
for distributed application testing.
> Score: Tiden: 1, Ducktape: 0
> 
> Criteria: Test writers API
> Supported integration test framework concepts are basically the same:
> - a test controller (test runner)
> - a cluster
> - a node
> - an application (a service in Ducktape terms)
> - a test
> Score: Tiden: 5, Ducktape: 5
> 
> Criteria: Tests selection and run
> Ducktape: suite-package-class-method level selection, internal scheduler allows to run
tests in suite in parallel.
> Tiden: also suite-package-class-method level selection, additionally allows selecting
subset of tests by attribute, parallel runs not built in, but allows merging test reports
after different runs.
> Score: Tiden: 2, Ducktape: 2
> 
> Criteria: Test configuration
> Ducktape: single JSON string for all tests
> Tiden: any number of YaML config files, command line option for fine-grained test configuration,
ability to select/modify tests behavior based on Ignite version.
> Score: Tiden: 3, Ducktape: 1
> 
> Criteria: Cluster control
> Ducktape: allow execute remote commands by node granularity
> Tiden: additionally can address cluster as a whole and execute remote commands in parallel.
> Score: Tiden: 2, Ducktape: 1
> 
> Criteria: Logs control
> Both frameworks have similar builtin support for remote logs collection and grepping.
Tiden has built-in plugin that can zip, collect arbitrary log files from arbitrary locations
at test/module/suite granularity and unzip if needed, also application API to search / wait
for messages in logs. Ducktape allows each service declare its log files location (seemingly
does not support logs rollback), and a single entrypoint to collect service logs.
> Score: Tiden: 1, Ducktape: 1
> 
> Criteria: Test assertions
> Tiden: simple asserts, also few customized assertion helpers.
> Ducktape: simple asserts.
> Score: Tiden: 2, Ducktape: 1
> 
> Criteria: Test reporting
> Ducktape: limited to its own text/html format
> Tiden: provides text report, yaml report for reporting tools integration, XML xUnit report
for integration with Jenkins/TeamCity.
> Score: Tiden: 3, Ducktape: 1
> 
> Criteria: Provisioning and deployment
> Ducktape: can provision subset of hosts from cluster for test needs. However, that means,
that test can’t be scaled without test code changes. Does not do any deploy, relies on external
means, e.g. pre-packaged in docker image, as in PoC.
> Tiden: Given a set of hosts, Tiden uses all of them for the test. Provisioning should
be done by external means. However, provides a conventional automated deployment routines.
> Score: Tiden: 1, Ducktape: 1
> 
> Criteria: Documentation and Extensibility
> Tiden: current API documentation is limited, should change as we go open source. Tiden
is easily extensible via hooks and plugins, see example Maven plugin and Gatling application
at [11].
> Ducktape: basic documentation at readthedocs.io. Codebase is rigid, framework core is
tightly coupled and hard to change. The only possible extension mechanism is fork-and-rewrite.
> Score: Tiden: 2, Ducktape: 1
> 
> I can continue more on this, but it should be enough for now:
> Overall score: Tiden: 22, Ducktape: 14.
> 
> Time for discussion!
> 
> ---
> [1] - https://www.testcontainers.org/
> [2] - http://arquillian.org/guides/getting_started/
> [3] - https://jmeter.apache.org/index.html
> [4] - https://openjdk.java.net/projects/code-tools/jmh/
> [5] - https://gatling.io/docs/current/
> [6] - https://github.com/gridgain/yardstick
> [7] - https://github.com/gridgain/poc-tester
> [8] - https://cwiki.apache.org/confluence/display/KAFKA/System+Test+Improvements
> [9] - https://github.com/gridgain/tiden
> [10] - https://pypi.org/project/jenkins-job-builder/
> [11] - https://github.com/mshonichev/tiden_examples
> 
> On 25.05.2020 11:09, Nikolay Izhikov wrote:
>> Hello,
>> 
>> Branch with duck tape created - https://github.com/apache/ignite/tree/ignite-ducktape
>> 
>> Any who are willing to contribute to PoC are welcome.
>> 
>> 
>>> 21 мая 2020 г., в 22:33, Nikolay Izhikov <nizhikov.dev@gmail.com>
написал(а):
>>> 
>>> Hello, Denis.
>>> 
>>> There is no rush with these improvements.
>>> We can wait for Maxim proposal and compare two solutions :)
>>> 
>>>> 21 мая 2020 г., в 22:24, Denis Magda <dmagda@apache.org> написал(а):
>>>> 
>>>> Hi Nikolay,
>>>> 
>>>> Thanks for kicking off this conversation and sharing your findings with the
>>>> results. That's the right initiative. I do agree that Ignite needs to have
>>>> an integration testing framework with capabilities listed by you.
>>>> 
>>>> As we discussed privately, I would only check if instead of
>>>> Confluent's Ducktape library, we can use an integration testing framework
>>>> developed by GridGain for testing of Ignite/GridGain clusters. That
>>>> framework has been battle-tested and might be more convenient for
>>>> Ignite-specific workloads. Let's wait for @Maksim Shonichev
>>>> <mshonichev@gridgain.com> who promised to join this thread once he
finishes
>>>> preparing the usage examples of the framework. To my knowledge, Max has
>>>> already been working on that for several days.
>>>> 
>>>> -
>>>> Denis
>>>> 
>>>> 
>>>> On Thu, May 21, 2020 at 12:27 AM Nikolay Izhikov <nizhikov@apache.org>
>>>> wrote:
>>>> 
>>>>> Hello, Igniters.
>>>>> 
>>>>> I created a PoC [1] for the integration tests of Ignite.
>>>>> 
>>>>> Let me briefly explain the gap I want to cover:
>>>>> 
>>>>> 1. For now, we don’t have a solution for automated testing of Ignite
on
>>>>> «real cluster».
>>>>> By «real cluster» I mean cluster «like a production»:
>>>>>       * client and server nodes deployed on different hosts.
>>>>>       * thin clients perform queries from some other hosts
>>>>>       * etc.
>>>>> 
>>>>> 2. We don’t have a solution for automated benchmarks of some internal
>>>>> Ignite process
>>>>>       * PME
>>>>>       * rebalance.
>>>>> This means we don’t know - Do we perform rebalance(or PME) in 2.7.0
faster
>>>>> or slower than in 2.8.0 for the same cluster?
>>>>> 
>>>>> 3. We don’t have a solution for automated testing of Ignite integration
in
>>>>> a real-world environment:
>>>>> Ignite-Spark integration can be taken as an example.
>>>>> I think some ML solutions also should be tested in real-world deployments.
>>>>> 
>>>>> Solution:
>>>>> 
>>>>> I propose to use duck tape library from confluent (apache 2.0 license)
>>>>> I tested it both on the real cluster(Yandex Cloud) and on the local
>>>>> environment(docker) and it works just fine.
>>>>> 
>>>>> PoC contains following services:
>>>>> 
>>>>>       * Simple rebalance test:
>>>>>               Start 2 server nodes,
>>>>>               Create some data with Ignite client,
>>>>>               Start one more server node,
>>>>>               Wait for rebalance finish
>>>>>       * Simple Ignite-Spark integration test:
>>>>>               Start 1 Spark master, start 1 Spark worker,
>>>>>               Start 1 Ignite server node
>>>>>               Create some data with Ignite client,
>>>>>               Check data in application that queries it from Spark.
>>>>> 
>>>>> All tests are fully automated.
>>>>> Logs collection works just fine.
>>>>> You can see an example of the tests report - [4].
>>>>> 
>>>>> Pros:
>>>>> 
>>>>> * Ability to test local changes(no need to public changes to some remote
>>>>> repository or similar).
>>>>> * Ability to parametrize test environment(run the same tests on different
>>>>> JDK, JVM params, config, etc.)
>>>>> * Isolation by default so system tests are as reliable as possible.
>>>>> * Utilities for pulling up and tearing down services easily in clusters
in
>>>>> different environments (e.g. local, custom cluster, Vagrant, K8s, Mesos,
>>>>> Docker, cloud providers, etc.)
>>>>> * Easy to write unit tests for distributed systems
>>>>> * Adopted and successfully used by other distributed open source project
-
>>>>> Apache Kafka.
>>>>> * Collect results (e.g. logs, console output)
>>>>> * Report results (e.g. expected conditions met, performance results,
etc.)
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> [1] https://github.com/nizhikov/ignite/pull/15
>>>>> [2] https://github.com/confluentinc/ducktape
>>>>> [3] https://ducktape-docs.readthedocs.io/en/latest/run_tests.html
>>>>> [4] https://yadi.sk/d/JC8ciJZjrkdndg


Mime
View raw message