cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14134) Migrate dtests to use pytest and python3
Date Fri, 05 Jan 2018 22:18:03 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314011#comment-16314011
] 

Ariel Weisberg commented on CASSANDRA-14134:
--------------------------------------------

Linking to the github diff seems broken right now. So I will have to specify files and line
numbers.

* One thing I noticed is that it seems like it's a bit easier for the harness to orphan a
running cluster? I know that with the nosetest setup I generally never had to worry about
leaving a C* cluster behind (which is pretty shocking), but when I killed a running test today
it left behind the cluster.
* cdc_test.py line 350 is part of a broken test you added a skip annotation to. It looks like
you added some whitelisting for log output, but that didn't fix the test. Should we remove
the whitelisting since it's not the right fix for the test?
* commitlog_test.py line 444, why do you have to use that idiom to update the list? List appending
is still in python 3? Is this to avoid the not defined check?
* Can you elaborate on why you had to switch to opening files in binary mode? Was it a bug
originally? Looks like those files are indeed binary. I am just wondering if it manifested
differently in python 3.
* dtest.py:176, comment is out of date? Fault handler is always on now?
* dtest.py has a few commented out bits of code that eventually probably need to be deleted
* materialized view tests line 2390 you switch from threads to processes and added a join?
What were you trying to address by switching from processes to threads, and adding the joins
and timeouts? Was it hanging?
* I see cases in asserts where constant integers were changed from say 7L to just 7. Is that
intentional? What is that about? (materialized view test, paging test)
* paxos_tests, you removed the utf-8 encoding statement at the top? Was that doing anything?
This happens in several files.
* read_repair_test is now inlining a method manually? test_read_repair? Do we have to have
that duplication? I know this Beobal's commit so I'll probably have to come back to it.
* incremental_repair_test:176, it's using the Latin-1 encoding?
* base_replace_address_test failed to properly handshake peer, is this just another case of
this error is generated and it's fine? Seems like you add this to tests that use JMX. Maybe
using JMX should automatically cause this to be added to ignored log lines?
* run_dtest.py still does the fork to a separate process. Does that accomplish anything? Can
we remove it?
* snapshot_test assert_directory_not_empty, I looked at this twice, it's just a little too
confusing and needs a comment as to what the parameters are and how it works.
* You regenerated the thrift bindings to get a python 3 supported version? You removed pycassa
but it also looks like we were hardly using it? Just want to make sure I understand what happened.
* tools/assertions.py assert_lists_equal_ignoring_order, this seems to sort results coming
from the DB. Shouldn't the DB already be providing a consistent sort order for results? I
thought that was well defined. I am wondering if we are papering over result ordering issues
in the DB.

I am going to look at read_repair_test now as it was a pretty significant change and I don't
know exactly what went down there.

> Migrate dtests to use pytest and python3
> ----------------------------------------
>
>                 Key: CASSANDRA-14134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14134
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Testing
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>
> h4. Get the C* dtests running on the pytest framework.
> C* DTests currently run using the python test framework nosetest. This framework has
been largely abandoned with no releases since 2015 and a general strong consensus in the python
community that pytest is the future.
> h4. Why should we do this.
> Currently (and historically) dtests have always been difficult to run, flaky and unpredictable
in CI environments, and almost impossible to debug.
> On November 28th, 2017, I proposed on the dev@ list that we move the dtests from nosetests
to pytests. I got replies from Jon Haddad, Philip Thompson, and kurt greaves with really only
"+1" like replies to the proposal.
> Since then I've been working pretty much non stop to complete the large refactor of dtests
to pytests. As part of this effort (and due to the migration tools that exist require it)
I also ported the code to python3 (from the current python 2.7 based code-base).
> h4. High-level summary of key changes, improvements, and new features.
> * Migrate dtests from executing using the nosetest framework to pytest
> * Port the entire code base from Python 2.7 to Python 3.6
> * Update run_dtests.py to work with pytest
> * Add --dtest-print-tests-only option to run_dtests.py to get easily parsable list of
all available collected tests
> * Update README.md for executing the dtests with pytest
> * Add new debugging tips section to README.md to help with some basics of debugging python3
and pytest
> * Migrate all existing Enviornment Variable usage as a means to control dtest operation
modes to argparse command line options with documented help on each toggles intended usage
> * Migration of old unitTest and nose based test structure to modern pytest fixture approach
> * Automatic detection of physical system resources to automatically determine if @pytest.mark.resource_intensive
annotated tests should be collected and run on the system where they are being executed
> * new pytest fixture replacements for @since and @pytest.mark.upgrade_test annotations
> * Migration to python logging framework
> * Upgrade thrift bindings to latest version with full python3 compatibility
> * Remove deprecated cql and pycassa dependencies and migrate any remaining tests to fully
remove those dependencies
> * Fixed dozens of tests that would hang the pytest framework forever when run in CI enviornments
> * Ran code nearly 300 times in CircleCI during the migration and to find, identify, and
fix any tests capable of hanging CI
> * Upgrade Tests do not yet run in CI and still need additional migration work (although
all upgrade test classes compile successfully)
> I started with the *nose2pytest* [https://github.com/pytest-dev/nose2pytest] migration
tool. As this required python 3 language support I found myself down the 2to3 python migration
path. While painful to do this, the benefits of python3 over python2.7 are numerous and moving
to python3 for the additional debugging tools now available to use when fixing dtests makes
the effort worth it for that reason alone!
> After the automated tools did their thing I began what was a much longer and tedious
manual process than I ever could have expected due to the custom many ways we did things in
dtests (frequently to work around nosetest limitations of missing features that thankfully
are now all included with the pytest framework). I've done nearly 300 test runs of my migration
branch with circleci.
> The latest CircleCI runs can be found at:
> (dtests without vnodes) [https://circleci.com/gh/mkjellman/cassandra/277]
> (dtests with vnodes) [https://circleci.com/gh/mkjellman/cassandra/278]
> With vnodes, there are currently only 6 remaining dtest test failures.
> Without vnodes, there are 12 remaining dtest failures.
> It turns out that after the dtests were moved to ASF Jenkins from cassci, the jobs were
misconfigured and we actually haven't been running the dtests in the non-vnodes configuration.
The current most recent trunk dtest job to complete on ASF Jenkins (with vnodes) was [https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-dtest/387/].
That test run had 36 test failures.
> There are great performance improvements so far with pytests. With a parallism factor
of 90x in CircleCI the dtests-no-vnodes job completed all tests in 16 minutes and 24 seconds
and the dtests-with-vnodes job completed all tests in 14 minutes and 22 seconds.. (Compare
that to the 8+ hours ASF Jenkins currently takes!).
> A dtest on pytest compatible CircleCI configuration is available at https://github.com/mkjellman/cassandra/blob/trunk_circle/.circleci/config.yml
> A very small patch is required to ccm for 2 places that were not yet python3 compatible.
(I didn't do a full audit of ccm, but these appear to be the only two issues I've found even
with a ton of testing when using ccmlib with dtests and running with python 3).
> {code}
> --- /home/cassandra/env/src/ccm/ccmlib/node.py  2017-12-13 03:11:13.000000000 +0000-8").splitlines()))
> +++ node.py     2017-12-19 09:51:49.083186951 +0000
> @@ -57,10 +57,16 @@
>          message = "Subprocess {} exited with non-zero status; exit status: {}".format(command,
exit_status)
>          if stdout:
>              message += "; \nstdout: "
> -            message += stdout
> +            if isinstance(stdout, str):
> +                message += stdout
> +            else:
> +                message += stdout.decode("utf-8")
>          if stderr:
>              message += "; \nstderr: "
> -            message += stderr
> +            if isinstance(stderr, str):
> +                message += stderr
> +            else:
> +                message += stderr.decode("utf-8")
>  
>          Exception.__init__(self, message)
>  
> @@ -2009,7 +2015,10 @@
>  
>          out, _, _ = handle_external_tool_process(p, ["sstableutil", '--type', 'final',
ks, table])
>  
> -        return sorted(filter(lambda s: s.endswith('-Data.db'), out.splitlines()))
> +        if isinstance(out, str):
> +            return sorted(filter(lambda s: s.endswith('-Data.db'), out.splitlines()))
> +        else:
> +            return sorted(filter(lambda s: s.endswith('-Data.db'), out.decode("utf-8").splitlines()))
>  
>  def _get_load_from_info_output(info):
> {code}
> For the actual dtest work itself: I've squashed all my work down into a single commit
available at [https://github.com/mkjellman/cassandra-dtest/tree/dtests_on_pytest_v2]
> And the actual commit available at [https://github.com/mkjellman/cassandra-dtest/commit/11201c149d1198c0d6d1a479e08bfb8695f14c9e]
> I tried to put a lot of effort into making the end user experience better for both bootstrapping
and running dtests for the first time, but also day to day. I hope people like the approach!
In addition to moving to argparse with commented command line arguments to control execution,
I also updated the README.md to reflect the new pytest world and also added a new section
with a few tips I learned while I was debugging hundreds of tests during this migration process
that hopefully are helpful to the rest of the community going forward!
> I'm sure the first commit above isn't 100% perfect (although I really have put a lot
of effort into it) but my one request is to hopefully get consensus with this first revision
of changes committed to reduce any chance of further drift with the codebase as changes are
committed into master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message