drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: [DISCUSS] Making the drill codebase easier to unit test
Date Wed, 17 Jun 2015 20:53:34 GMT
I don't know much work this involves (it seems a lot!) but this would be
really useful. Like you said, with the current model coming up with good
unit tests can be really tricky especially when testing the edge cases, and
the worst part is that any changes to how queries are planned or for
example the size of the batches can make some tests useless.

On Tue, Jun 16, 2015 at 12:38 PM, Jason Altekruse <altekrusejason@gmail.com>

> Hello Drill devs,
> I would like to propose a proactive effort to make the Drill codebase
> easier to unit test.
> Many JIRAs have been created for bugs that should have been prevented by
> better unit testing, and we are still fixing these kinds of bugs today as
> they crop up. I have a few ideas, and I plan on creating JIRAs for specific
> refactoring and test infrastructure improvements. Before I do, I would like
> to collect thoughts from everyone on what can get us the most benefit for
> our work.
> As a short overview of the situation today, most of the tests in Drill take
> the form of running a SQL query on a local drillbit and verifying the
> results. Plenty of times this has been described as more of integration
> testing than unit testing, and it has caused several common testing pains
> and gaps.
> 1. batch boundaries - as we cannot control where batches are cut off during
> the query, complete queries often make it hard to test different scenarios
> processing an incoming stream of data with given properties.
>          - examples of issues: inconsistent behavior between operators,
> some
>            operators have failed to handle empty batches, or a batch full
> of nulls
>            until we wrote a test that happened to have the right input file
> and plan to
>            produce these scenarios
> 2. Valid planning changes can end up making tests previously designed to
> test execution fail in new ways as the data will now flow differently
> through the operators
> 3. SQL queries as test specifications make it hard to test "everything",
> all types, all possible data properties/structures, all possible switches
> flipped in the planner or configuration for an operator
> I would like to start the discussion with a proposal to fix some of these
> problems. We need a way to run an operator easily in isolation. Possible
> steps to achieve this include, a new operator that will produce data in
> explicitly provided batches, that can be configured from a test. This can
> serve as a universal input to unit test operators. We would also need some
> way to consume and verify the output of the operators. This could share
> code with the current query execution, or possibly side step it to avoid
> having to mock or instantiate the whole query context.
> This proposal itself is testing a relatively large part of the system as a
> whole "unit". I would be interested to hear opinions on the utility vs
> extra effort of trying to refactor more classes so that they can be created
> in tests and have their individual methods tested. This is already being
> done for some classes like the value vectors, but it is far from
> exhaustive. I don't expect us to start rigidly enforcing this level of
> testing granularity everywhere, but there are components of the system that
> really need to be resilient and be guaranteed to stay that way as the
> project evolves.
> Please chime in with your thoughts.


Abdelhakim Deneche

Software Engineer


Now Available - Free Hadoop On-Demand Training

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message