Some principles of unit testing

Much pontification exists when it comes to unit testing. Developers get introduced to the idea of TDD with simple code such as testing a method that adds two numbers etc so that they can learn the mechanics, but such motivators obscure away the very real tradeoffs that arise from the fact that tests are also code.

No one writes code that tests the tests. When faced with complicated, hard to test code and behavior they do not undersand fully, developers tend to fall into the self-deception trap so as to keep tests passing instead of investing time in refactoring. Sometimes, tests are written but are never exercised due to unforeseen reasons and no one notices the fact because everyone assumes a green check next to a commit means things are allright! Sometimes, tests do not actually test code, but whether computers can do arithmetic. And, sometimes, the reason your test suite takes a long time to run is because someone thought it was a good idea to run thousands of tests that will either all fail or all pass. Occasionally, your deployment pipeline grinds to a halt because the tests are overly-complicated.

No one writes perfect code. Software development is change management: As external conditions change, code needs to continue to work and that’s why we benefit from having unit tests: First, they help codify our understanding of how the code should behave. Second, they help us feel confident that our changes will not break behavior on which we depend.

So, how do we avoid chasing our tails in a neverending cycle of testing the code tests the tests that test the code etc.

There is no perfect solution, but I will share a few principles that have helped. I have not seen the motivations for TDD explained this way. If TDD is presented of as a rigid set of rules instead of helping developers understand the motivations, you end up with tests that are hard to maintain and are flaky.

Understand false positives and false negatives

When tests are run, there are four (not two) possible outcomes:

Tests pass Tests fail
Code is good True Negative(1) False Positive(2)
Code is bad False Negative(3) True Positive(4)

Cases (1) and (4) correspond to true negatives and true positives. If your code and test suite never exhibit any other behavior, congratulations.

The third scenario causes wasted resources: When tests fail, you have to investitage the reasons. Deployment pipelines stall. Since developers are people and people in general do not want to be put in a position of having ignored warnings. Different teams deal with this in different ways. A rather counterproductive strategy I’ve seen is to say “we expect 10% of tests to fail” or some such. Why include such “tests” in the first place? Bonus points if the subset of tests that fail changes with every deployment.

The final scenario is no less insidious. When bad code is deployed because tests are written to pass, problems manifest themselves in production in the form of errors that hard to pin down, unexpected downtime, or worse.

This is where TDD is really helpful. By writing tests before you write the code that implements functionality, you know that the test failure you get there is a true positive: The test is failing because there is no code to provide the required behavior. When you then write some code and the test passes, you know that tests are passing because of the code you wrote and not the other way around.

This doesn’t mean the code is not buggy. However, it does get you started on the path of writing a set of tests that fully specify the expected behavior of a piece of code. In most cases, we start writing code with an imperfect understanding of the world with which the code will be expected to interact. In fact, even if we knew everything perfectly on Monday, by Friday, an external change might invalidate one of our assumptions and we might need to specify a new constraint which must be satisfied by the code. Being able have the contract build upon strong foundations of known valid statements improves the maintainability of the code and reduces the amount of information developers must keep in their heads when making changes.

Write simple tests

Brian Kernighan said:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

A similar principle applies to tests. Since tests codify the expected behavior of a piece of code, it is best to be able to state that expected behavior as clearly and directly as possible. Otherwise, confusion ensues. Developers cannot see the how changes in code relate to outcomes of tests. If you find yourself writing a couple of hundred lines to create the appropriate pre-conditions for a test, realize that that time can be much better spent refactoring the code under test so testing it does not require so much set up.

It is far more preferable to have clear, concise, easily reviewable tests that offer partial coverage than to have 100% coverage which requires baroque code to have them run. That code will not have tests. It will be hard to assure yourself that tests are passing for the right reasons. Of course, do try to increase coverage over time, but sometimes you can’t get there immediately. All team members should be aware of where coverage is lacking and try to increase it, but don’t accept the Faustian bargain of checking in unmaintainable spaghettic mocking code for increased coverage stats.

Unit tests are not uptime indicators for remote services

Unit tests should not reach outside of the environment (machine, container, jail) in which they are running. This includes network access. If you want to ascertain that your app can correctly handle a response from another API, your tests do not need to reach out to the service. Ideally, you separate the thing that makes the request and the thing that deals with the response and you can invoke the handler with just a response object you have constructed. Dynamic languages such as Perl, Python, Ruby, JavaScript make this rather easy. Otherwise, I prefer relying on interfaces rather than class hierarchies.

Unit tests that try to hit other services over the internet reduce the security of your testing and deployment infrastructure. On a purely practical level, checking if a remote service on which your app depends on is not the job of the unit tests. Downtime is a fact of life and apps that need to work with services have to be able handle that. Even code testing this functionality should not be making requests to the real service, however, as you cannot expect to be able to simulate downtime conditions on actual services.

Further, does you organization really want to tell outsiders every time you are trying out some new code?

Always test whether the code compiles, the module loads, and the expected methods exist

This is much more important in the case dynamic languages where simple typos may remain undiscovered until the right call chain happens. However, even in C++, it is useful to ensure that each test run starts with a clean slate so as to catch stupid simple problems early on. In scripting languages where the chances of catching doodad.frobncate when you had intended to invoke frobnicate on the doodad without actually running the code, this becomes more important.

Also, even, or especially, when working with languages without a concept of interfaces, codify the expected interfaces in tests. With Ruby, you can use respond_to. With Perl, you can use can_ok. In Python, depending on the situation, a combination of getattr with handling AttributeError should help. When you are faced with a pages long backtrace in production that ends with the equivalent of “method not found”, it helps to know that the methods that are supposed to exist do exist and therefore the problem must be near the invocation not the definition.

This is not a panacea, but given that we all make “stupid” mistakes sometimes, it is useful to rule out whole classes of them at the outset.

Finally

We exist in an imperfect world. It is fun to strive for perfection in our hobby projects, but when solving business problems in a dynamic environment with preexisting codebases and processes, tradeoffs must be made. By sticking with a simple set of principles rather than rigid processes, we ought to be able to make better tradeoffs more often than not. Despite all the glossy advertising for various fashionable “methodologies”, that’s really the best one can hope for.

PS: I am no longer on Reddit, but you can discuss this post on r/perl