flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Flink Jira Bot (Jira)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-11463) Rework end-to-end tests in Java
Date Thu, 27 May 2021 23:04:03 GMT

     [ https://issues.apache.org/jira/browse/FLINK-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Flink Jira Bot updated FLINK-11463:
    Labels: pull-request-available stale-assigned  (was: pull-request-available)

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help the community
manage its development. I see this issue is assigned but has not received an update in 14,
so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a comment updating
the community on your progress.  If this issue is waiting on feedback, please consider this
a reminder to the committer/reviewer. Flink is a very active project, and so we appreciate
your patience.
If you are no longer working on the issue, please unassign yourself so someone else may work
on it. If the "warning_label" label is not removed in 7 days, the issue will be automatically

> Rework end-to-end tests in Java
> -------------------------------
>                 Key: FLINK-11463
>                 URL: https://issues.apache.org/jira/browse/FLINK-11463
>             Project: Flink
>          Issue Type: New Feature
>          Components: Test Infrastructure
>            Reporter: Chesnay Schepler
>            Assignee: Zheng Hu
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> This is the (long-term) umbrella issue for reworking our end-to-tests in Java on top
of a new set of utilities.
> Below are some areas where problems have been identified that I want to address with
a prototype soon. This prototype primarily aims to introduce certain patterns to be built
upon in the future.
> h2. Environments
> h4. Problem
> Our current tests directly work against flink-dist and setup local clusters with/-out
HA. Similar issues apply to Kafka and ElasticSearch.
> This prevents us from re-using tests for other environments (Yarn, Docker) and distributed
> We also frequently have issues with cleaning up resources as it is the responsibility
of the test itself.
> h4. Proposal
> Introduce a common interface for a given resource type (i.e. Flink, Kafka) that tests
will work against.
> These resources should be implemented as jUnit external resources to allow reasonable
life-cycle management.
> Tests get access to an instance of this resource through a factory method.
> Each resource implementation has a dedicated factory that is loaded with a {{ServiceLoader}}.
Factories evaluate system-properties to determine whether the implementation should be loaded,
and then optionally configure the resource.
> Example:
> {code}
> public interface FlinkResource {
> 	... common methods ...
> /**
> 	 * Returns the configured FlinkResource implementation, or a {@link LocalStandaloneFlinkResource}
if none is configured.
> 	 *
> 	 * @return configured FlinkResource, or {@link LocalStandaloneFlinkResource} is none
is configured
> 	 */
> 	FlinkResource get() {
> 		// load factories
> 		// evaluate system properties
> 		// return instance
> 	}
> }
> public interface FlinkResourceFactory {
> 	/**
> 	 * Returns a {@link FlinkResource} instance. If the instance could not be instantiated
(for example, because a
> 	 * mandatory parameter was missing), then an empty {@link Optional} should be returned.
> 	 *
> 	 * @return FlinkResource instance, or an empty Optional if the instance could not be
> 	 */
> 	Optional<FlinkResource> create();
> }
> {code}
> As example, running {{mvn verify -De2e.flink.mode=localStandalone}} could load a FlinkResource
that sets up a local standalone cluster, while for {{mvn verify -De2e.flink.mode=distributedStandalone
-De2e.flink.hosts=...}} it would connect to the given host and setup a distributed cluster.
> Tests are not _required_ to work against the common interface, and may be hard-wired
to run against specific implementations. Simply put, the resource implementations should be
> h4. Future considerations
> The factory method may be extended to allow tests to specify a set of conditions that
must be fulfilled, for example HA to be enabled. If this requirement cannot be fulfilled the
test should be skipped.
> h2. Split Management
> h4. Problem
> End-to-end tests are run in separate {{cron-<version>-e2e}} branches. To accommodate
the Travis time limits we run a total of 6 jobs each covering a subset of the tests.
> These so-called splits are currently managed in the respective branches, and not on master/release
> This is a rather hidden detail that not everyone is aware of, nor is it easily discoverable.
This has resulted several times in newly added tests not actually being run. Furthermore,
if the arguments for tests are modified these changes have to be replicated to each branch.
> h4. Proposal
> Use jUnit Categories to assign each test explicitly to one of the Travis jobs.
> {code}
> @Category(TravisGroup1.class)
> public class MyTestRunningInTheFirstJob {
> 	...
> }
> {code}
> It's a bit on the nose but a rather simple solution.
> A given group of tests could be executed by running {{mvn verify -Dcategories="org.apache.flink.tests.util.TravisGroup1"}}.
> All tests can be executed by running {{mvn verify -Dcategories=""org.apache.flink.tests.util.TravisGroup1""}}
> h4. Future considerations
> Tests may furthermore be categorized based on what they are testing (e.g. "Metrics",
"Checkpointing", "Kafka") to allow running a certain subset of tests quickly.
> h2. Caching of downloaded artifacts
> h4. Problem
> Several tests download archives for setting up systems, like Kafka of Elasticsearch.
We currently do not cache downloads in any way, resulting in less stable tests (as mirrors
aren't always available) and overall increased test duration (since the downloads at times
are quite slow). The duration issue becomes especially apparent when running tests in a loop
for debugging or release-testing purposes.
> Finally, it also puts unnecessary strain on the download mirrors.
> h4. Proposal
> Add a {{DownloadCache}} interface with a single {{Path getOrDownload(String url, Path
targetDir)}} method.
> Access to and loading of implementations are handled like resources (see above).
> The caching behavior is implementation-dependent.
> A reasonable implementation should allow files may be cached in a user-provided directory,
with an optional time-to-live for long-term setups.

This message was sent by Atlassian Jira

View raw message