Thesis
If you want have a nail you need to hammer
in, you need a hammer. If you have a screw you want to screw in, you need a
screwdriver. A hammer does a poor job at screwing in screws. A screwdriver does
a poor job at hammering in nails. Unit tests are a tool. I believe that unit tests, like every single
tool in existence, are not useful in all situations.
I want to propose a few ideas. 1) Tests
range in usefulness. One unit test may be better than another. A unit test
could even have such low quality as to be useless. 2) Certain code lends itself
to unit testing more than other code. Some code is easily tested and some code
is very difficult to test. 3) There is a relationship between how difficult the
code is to test and the quality of the unit test you generate for this code.
When the code is harder to test the resulting unit tests are of lower quality.
If you accept these ideas then it follows
that to maximize the value of your unit tests you should focus on writing good
tests on code that is easily unit tested. Even if you do not believe the third
idea you still should accept that to maximize the effectiveness of your efforts
you should focus on high quality tests and/or easily constructed tests.
So we need to understand the value of unit
testing and how to create effective tests. We should also understand what code
lends itself to testing. We want to get the most value out of the tests we
write and not waste effort on activities that will not produce value in the
long run.
Tiny Bit of History
Okay, first a tiny bit of history. Unit
testing started coming to the fore in the late 90's as part of the set of Agile
methodologies. Unit testing was a focus of Agile due to the focus on the code
being able to tolerate rapid change. Now Agile and unit testing are considered
standard parts of engineering now. Related to unit testing in Test Driven
Development. This is the practice of
writing unit tests alongside and slightly before you write code.
Benefits
Effects on Coding
One of the key benefits of unit testing is
forcing engineers to write code that is unit testable. If you are going to
write unit tests you want to make sure the code you are writing won't cause a
problem when you write the tests. So let me take that and ask, what makes my
code easy or hard to unit test?
Pure Functions
Let's start with the notion of a 'pure
function'. What programmers call a 'pure function' is what mathematicians call
a 'function'. Programmers took the word function and abused the hell out of it
till it meant something different. (We didn't do it on purpose. It just sort of
happened.) So a pure function is a method that will always return the same
thing for a given input. Pure functions are much easier to reason about because
they have consistent behavior. They will also be more robust since they are
only affected by incoming variables and are not prey to uncertain external
state. Functional programming tends to stress having functions be pure. And it
turns out pure functions are very easy to unit test. You have a set of inputs
and you know the exact outputs they will generate.
Note that the idea of 'pure functions' is
somewhat against the notion of object oriented programming because object
oriented programming focusses on passing a hidden this parameter to every
function. This does not mean that object-oriented languages and pure functions cannot
co-exist peacefully.
We know that pure functions are easy to
reason about, more stable, and very easy to test. So if we want to write easily
unit testable code we should strive to contain as much of our code as possible
in pure functions. Any time you write code you should think about whether you
can isolate a piece of this code into a pure function. Because we are using a
functional (note that JavaScript is functional, but not 'pure functional')
programming language we have a flexibility with functions that will help us do
this. And if we are writing code as pure functions we have code that is clear
and easy to test. So unit testing helps to create an incentive to write pure
functions which makes our code better.
Cyclomatic Complexity / KISS
Principle / Single Responsibility Principle
When you have a method that does an
enormous amount of stuff it is very hard to test. Imagine a piece of code with
a lot of if statements that does very different things depending on various
conditions. This code becomes hard to test. The tests have to set up precise
conditions to trigger different sections of code. The tests become more and
more intricate and harder to understand. The simple step to solve this problem
is to break up the code into smaller pieces. This makes it easier to write
tests for the code in question
It is hard to deal with complexity. Unit
tests make it harder so they provide an incentive to break down complex methods
in order to make life easier. This leads to improved code that is simpler.
External Dependencies /
Coupling (Not the British sit-com)
External dependencies make your code harder
to test. To unit test a method with external dependencies you need to generate
mocks or maybe some interception scheme. These schemes can often be complex and
hard to work with making the test more complex than the code it is testing. So
to make unit testing easier you try and minimize or isolate these dependencies.
Note that external dependencies are what is
sometimes referred to as 'coupling'. Coupling is a measurement of how dependent
an object is on other objects. Unit tests give us the incentive to minimize and
isolate external dependencies and reduce coupling.
Maximizing this Principle
Unit tests incentivize the writing of pure
functions, the reduction of cyclomatic complexity, and the reduction of
coupling. This means the unit tests incentivize us to write good code. But this
brings up the issue of when we write those tests. If unit tests are to help
drive us to create better code we need to change our code while we write tests.
If we generate the code and then view it as complete before we write tests we
lose most of the benefits of the unit testing making us code better. Test
driven development (TDD) says that you should write the tests as you code and
that coding and testing should be a tightly interwoven process.
If we do not change code in response to our
creating unit tests we will get into situations where our unit tests have to
employ special strategies that may be complex or not robust. Using these
strategies to get around the way tests force better coding circumvents the
positive effect on code quality of unit testing. If is a far better strategy to
write the unit tests closer to the code so the code can be adapted to the needs
of the tests.
Contracts
Unit tests also provide a way of defining a contract. You make a
statement about what a piece of code does and you have a guarantee that this is
true. This is very useful in a few scenarios, but not necessarily all
scenarios. Application Programming Interfaces (APIs) are sets of methods to
interact with a component. They essentially act as a contract for a piece of
code. You want them to have very
specific behavior and you want to be sure that they always keep this behavior.
Unit tests are a good way to be certain that an API continues to behave as
promised.
APIs are usually thought of as ways to
interact with third party components, but it is possible to have a large
software products and have boundaries between teams defined by an API. This
makes APIs even more useful because they can act as a means of communication.
One team can produce tests to highlight they behavior they require while the
other team can them be sure to meet those tests. In the past I have crafted
failing unit tests for another teams components to show them how their
component was flawed. This allowed them to fix the issue and then directly run
my test to confirm the problem was fixed. They could then add this test to
their suite to make sure this contract was fulfilled going forward.
Safe Refactoring
Code can often be difficult to understand.
Reading code to understand it is a different skill than writing code to
accomplish a task. Code also is often modified to produce new behavior and this
modification can make code harder to understand. Refactoring refers to
rewriting code to make it clearer. Refactoring code has a lot of value because
easy to understand code is more maintainable. However, changing code introduces
risk that the behavior changes. This could introduce a defect. Some
organizations take a "don't touch it" approach to code.
Unit tests allow us to refactor code
without concern. Note that this really only applies when the full output of the
method under test is tested. With unit tests providing a safety net you can
refactor easily. One caveat to this is that while these unit tests help with
refactoring within a method they do not help when the refactoring goes against
methods and changes what methods they are and how they interoperate. So they
help with small refactorings, but can actually make big refactorings harder.
Tests to help with refactoring also can be
a little different than other tests. Tests for refactoring often want to ensure
output is consistent. I will talk a little more about this later on. Unit tests
for refactoring can often make sense to write after the code is complete. You
can create code to ensure you do not introduce any changes. Again, the challenge
becomes making sure your unit tests capture the behavior. If your code is hard
to unit test it becomes less likely that your tests will be able to capture
behavior and more likely that refactoring will introduce problems.
Test Driven Development
Test Driven Development is the process of
writing unit tests slightly before writing production code. In this process no
production code is written unless it is to make a test pass. One of the goals
is to have extensive unit testing. Many people consider this the gold standard
for programming. If you have tried Test Driven Development you have probably
shared the experience that the quality is much higher and the TDD helps you
find errors from cases that you normally wouldn't have considered too much.
Despite the fact that I have such a high opinion of TDD I actually rarely use
it, because many problems do not lends themselves to TDD. For example, I want to write a piece of code that produces
a specific bit of markup. My test would be to write the markup first. I am not
convinced this is a useful test. Say that I have code that gathers data from
two sources. I immediately have to write some sort of mock in my test saying
something like 'did I call method X'. Then I would write the call to this
method. Again, in the right circumstances TDD seems very useful, but in others
circumstances it seems to be a hindrance. When you have a problem you can ask
yourself a few questions: Can I write tests before I write the code? What is
easier, writing tests or the code? Do the tests require the code to essentially
be written first because there are dependencies?
What makes a good test?
If we want to maximize the value of the
effort we put in to testing we need to figure out how to write a valuable
test. So we need to understand what a
valuable test is. I will try and offer some rough advice on how to identify a
useful test and I will introduce a categorization as to whether a test checks
correctness, consistency, or identicalness.
When does a test fail?
When creating unit tests I believe a very
important question to ask is, "when will this test fail?" There are a
couple possible answers to this: 1) a unit test will fail when the code under
test no longer behaves correctly (fails correctness), 2) a unit test will fail
when it produces a different result than expected (fails consistency), 3) a
unit test will fail when the code under test changes (fails identicalness), 4)
the unit test will fail based on changes in external dependencies. Another
answer, the TDD answer, is "right before the code satisfying the test is
written."
Let me clarify the difference between 1 and
2, correctness and consistency. A method like "addition" produces a
correct answer. For a method that produces a chunk of HTML it is harder to
quantify the output as correct or not. For example, perhaps you want to add a
new class to the output HTML. This output is still "correct", but is
no longer the chunk that used to be produced. So this output is
"correct" while not being "consistent". The line between
"correct" and "consistent" can be blurry. The line between
"consistency" and "identicalness" can also be fuzzy.
I believe that tests that fail in condition
1, correctness, are very useful. Tests that fail in condition 2, are somewhat
useful. They are useful for refactoring, but they can fail on perfectly valid
code and will force unit test updates.
Tests that fail in condition 3,
identicalness, are very suspect in my mind. I am not sure what those tests are
really testing. They detect code change. I think we can largely assume that
unchanged code will continue to perform in the same way as it did before. A
test that just fails when the code is changed seems to detect something you
know is happening so they don't seem very useful. You can make the argument that making it hard
and costly to change code is valuable because it prevents you from introducing
potentially breaking changes. I don't think making it hard to change code has
any value. The value of unit tests should be that they make it easy to change
code. If you want to make it harder to change code place your keyboard on the
ground and type with your toes. It will force you to think about every line you
type.
Tests that fail in condition 4 are bad and
need to be refactored. Tests that fail when nothing in the code under test
changes are false positives. False positives create added maintenance for tests
and unnecessary overhead. Unit tests need to be extremely isolated. We use
dependency injection to allow for unit tests to be completely isolated from
other parts of the system.
Unfortunately, we have tests that are sensitive to configuration changes
and fail when we change unrelated things. I think that when we have a unit test
failure we should change either the code under test by the failing test or we
should change the unit test. Changing other parts of the code does not fix the
situation and a failing test should prompt some sort of change to address the
failure whether that is refactoring the code or refactoring the test.
When does a test fail? Part
two!
I talked about what causes tests to break,
but when do tests really fail. They fail when you go back and modify the code.
The situations where you go back and modify code can be roughly broken down
into a few categories. 1) A defect is detected that was not detected by unit
tests. 2) New features are being added or feature behavior is being changed. 3)
Minor refactoring is being done. By minor I generally mean refactoring within
method boundaries. 4) Major refactoring. By major I mean refactoring that
structurally changes the code and shifts responsibility between methods or
objects, either old ones or newly created ones.
When fixing defects or adding features,
tests focusing on correctness will give you good information and detect whether
a defect fix introduces a new problem. Tests focused on consistency will
probably fail because behavior is changing. This information will probably not
be useful since a behavior change is intended. The change in output may be a
mistake made while coding, but it is more likely that the test if failing
because output intentionally changed. Tests for identicalness will fail because
you have altered code. This is not useful feedback from the test.
For minor refactoring both correctness and
consistency tests will give good information. Tests for identicalness will give
bad information because your intent is for the code to change.
For major refactoring, almost all unit
tests will fail and need to be updated. This is because the unit of test will
change since methods will change and responsibility will be redistributed
between the methods.
From this we get the sense the tests that
focus on 'correctness' are very useful while those focus on consistency may or
may not be useful. Test that test identicalness are probably not very useful.
Different Types of Tests
Here are a couple test patterns I have seen
and wanted to mention.
Mirrors
Testing is often broken into black box and
white box. In black box testing you write the tests without knowledge (or
pretending you don't have knowledge). In white box testing you test based on
what is in the code. I think black box tests are the best tests, but white box
tests are often easier to write.
One kind of test is what I call a mirror
test. The test is essentially a mirror of the code. I once worked on code
developed by some contractors. These contractors had sold the product as fully
unit tested <sarcasm> because surely fully unit tested code is going to
be high quality </sarcasm>. Anyway, I modified some SQL code in a stored
procedure and it triggered a unit test failure. This was obviously a
"consistency" based test since the output was correct. I looked at
the unit test. The unit test was a direct copy of the SQL in the stored
procedure. The output of the unit test running the SQL was simply compared to
the output of the stored procedure running the cut and paste SQL. So the test
was built to fail if the functionality of the stored procedure changed. To fix
the test you simply would paste your new SQL code into the test. This test
seemed to not be very useful. You could make the argument that this test is
useful if you wanted to refactor the stored procedure, but I think this was
still a poor way to create such a test.
Although similar pattern I have seen is the
mocking of all calls. Testing frameworks have been built to allow you to
intercept and mock external calls. This is useful because you want to isolate
code, but this tool can be used to essentially copy all the calls and make a
test saying, this method works if it makes all the calls that it happens to
make. This tends to be a test of identicalness. Say for example, you discover a
new method that works better for your purposes. The test then fails because you
didn’t make the exact same calls. This kind of test is like a mirror held up to
all the calls of the method.
Checks for Consistent Output
One of the problems with generating tests
is that a lot of code does not have 'correct' output. It is also fairly data intensive.
For example you may have a piece of code that generates a large section of
markup and does a small manipulation on a large data set producing another
large data set. Because of this unit tests tend to be focused on 'consistency'.
One of the patterns for generating such tests is to write the code and then use
the output of the code as the test. This kind of test can be useful for method
based refactorings, but usually for any kind of defect fix or feature change it
isn't useful.
While it is generally better to make the
unit testing process as close to the coding process as possible, with certain
forms of refactoring it may make sense to generate tests like this right before
refactoring to ensure that no behavioral changes are introduced in the
refactoring process.
Trivial
In a previous job I have seen tests on
simple properties. These properties almost always had the form of being simple
getters and setters on a backing field. Since there were a lot of properties
there was a lot of simple code like this. (This was in C# before there were
auto-properties.) These tests had a form like "Foo = 1;
Assert.AreEqual(Foo,1)". This was done in pursuit of code coverage, but
these tests were so trivial as to be meaningless. I don't think generating
large amounts of essentially meaningless code is good. Making your codebase
larger has a maintenance cost because maintainers have to sort through the
code. Tests should be meaningful and not mindless boilerplate.
Another set of trivial tests I have seen is
the testing of code that produced markup. The tests tested whether the code
produced a non-empty string. This code was most likely produced to generate
coverage numbers, but the tests were largely useless. Of course, the nice thing
was that they weren't likely to fail.
Complex Tests
This really isn't a category of test, but
rather an attribute of tests. We should strive to make simple unit tests. When
the complexity shifts into tests it means that the likelihood the problem is in
the test as opposed to the code under test increases.
What kind of code is good to
test?
Above I talked about how writing tests
created incentives to switch to certain patterns. The core pattern needed by
testing is dependency injection. I did not talk about it much, but that is a
core part of testability and modern programming. Use dependency injection. The
other important pattern is pure functions. If code can be encapsulated in pure
functions it is easy to test. Code with low coupling and complexity is also
nice to test.
Since I talked a lot about that above,
here, I want to talk about a distinction I will refer to as 'logic' vs. 'glue'.
When I refer to logic I mean code written to manipulate data. By glue I mean
code focused on fetching or sending data to another component. I claim that
'logic' code is more complex and requires more testing while 'glue' code is
less complex and is much more difficult to test. Now many methods are not all
glue or all logic, but all methods are open to refactoring and if you can
remove the 'logic' from the 'glue' it becomes much easier to write tests. This
is actually one of the general goals of dependency injection and the Law of
Demeter (minimize communication). These principles state that you should pass
in the minimum you need to a method and everything should be passed in instead
of generated inside the method. What this is actually doing is minimizing
dependencies by removing the glue. Once all the glue is removed you only have
logic. But I don't think DI is the only way to do this and if we want to write
tests we should try and isolate logic from glue by moving logic into separate
methods. I also feel that this approach means you should make your 'glue' as
simple as possible. If the glue part is very simple you have code that is
simple to understand and hard to unit test. Tests for this code then have very
little value and I argue that instead of testing 'glue' we should isolate it
and try to make it as easy as possible and not mix it with 'logic'.
Code Metrics
It is very common for teams to measure how
well they are unit testing things by the notion of "code coverage".
People like metrics. I like metrics, especially ones that measure code
complexity. The key thing with metrics is understanding what they actually
measure. So what does code coverage actually measure? You might answer, 'How
well your system is tested.' That is incorrect. Coverage measures the number of
lines run during a test. You can take a system with 100% coverage, remove all
the asserts and you will still have 100% coverage and you will test nothing.
You can also have insufficient testing of a complex method but a couple tests
manage to cover all the line so it appears fully tested even though this is
where you should be doing more testing. You can spend effort doing trivial
tests as mentioned above to increase coverage.
So 'coverage' as a goal in and of itself is
bad. Coverage, as a tool, can help identify what code is run during unit tests
and it can help you find places where it might make sense to add tests, but I
think that when you find such a section you should be asking, 'does it make
sense to test this?' or 'how can I refactor this code to isolate logic to make
testing easier?'
If you use coverage as a metric for driving
unit tests it will potentially direct you the wrong way. So if you believe the
assumption that some code lends itself more towards unit testing than other
code then you have to realize that aiming for code coverage can cause problems.
Instead of writing extra tests for complex logic that might already be covered
you might write meaningless test for code just because you haven't covered
it. Coverage can be used to help you
find places where it might make sense to add tests, but I think that when you
find such a section you should be asking, 'does it make sense to test this?' or
'how can I refactor this code to isolate logic to make testing easier?'
Now if you are a contractor and you sell
code boasting it has 100% code coverage it makes sense to write a bunch of
poorly conceived tests just to obtain a high coverage percentage because that
it part of the product. Since our coverage metric isn't part of our product we
should not be bound by this.
Conclusion
I am not against unit tests. I am against
blindly applying a technique because conventional wisdom has labeled it good. I
believe that unit tests vary in quality. I believe the some code makes a lot of
sense to test while some code makes little sense to test.
I have several recommendations in terms of
testing.
Reduce Dependencies
We should reduce dependencies in our
current unit tests. Some of our tests require angular modules to be built.
These tests are subject to failures due to reasons outside of the test or the
code under test. In my opinion a test failure should almost always be met with
changing the code or changing the test. In the case that something external
causes a test failure the test should be modified to be less fragile.
Unify Coding and Testing
Unit testing is most effective when
combined with coding. The closer the unit testing process is to the coding the
more valuable it will be. We should not let ourselves get into the position of
unit testing code we do not want to change. This happens when we generate code
and release it to QA. We then do not want to change code under test by QA so we
do not feel our testing can lead to refactoring.
Don't Test Everything -
Isolate Glue
We should be testing complex logic and not
glue. To make this easy we should try and isolate logic code and write tests
for it. The glue code is hard to test and should be simple enough so that tests
wouldn't have much value.
Correctness > Consistency
> Identicalness
The lines between these can get blurry, but
in general our tests should test correctness as much as possible. Consistency
tests are fine, but they are less valuable. Identicalness tests probably should
be avoided.
Test for usefulness not for
the sake of testing
I have seen many unit tests that have not been
very useful but have been written because testing was seen as useful. We should
strive to write unit tests. If we fail in that, which we should not do, we
should not write bad tests because they will increase maintenance costs while
offering no value
No comments:
Post a Comment