Software testing observations and recommendations

13 July 2023
testing

Humans write buggy software. However, bugs are bad for business, so in an effort to reduce the quantity and severity of bugs users encounter, software and Quality Assurance (QA) engineers test their software. Finding bugs during software testing is thought to be beneficial for several reasons:

Customers are happier when software works better. Happy customers are better for business.
On-call emergencies drain morale. If an engineer receives enough 3am PagerDuty alerts, he or she will quit.
Finding a bug earlier means you're less likely to build other work on top of that bug (and propagate the buggy behavior).
Having testing in place gives development teams the confidence to make changes and add new features without too much fear of breaking existing functionality.

I agree with these benefits. I have also accrued a number of additional observations and recommendations, which I share in the following sections of this article.

Observations #

It's important to keep in mind that I have largely been a web app engineer when making these observations. While I suspect many of them are hold for other domains, I can't state as much with absolute certainty.

Testing is an attempt to model the real world at varying levels of abstraction #

We have already discussed that the reason we perform testing is to minimize the number and severity of bugs that occur in our production systems. However, testing is actually not theoretically the best way to discover all of these potential bugs.

Theoretically, we could identify every bug in our system by running it in production for an infinite amount of time and observe all of the defects that occur. While this would be superior to testing, it has the one downside of being totally impossible.

So—we test! But we can't include every possible piece of information about the universe. Instead, we identify the pieces that we believe are most relevant to modeling the universe in the context of the system under test. Additionally, we do this at varying levels of abstraction. We may write unit tests where we assert the behavior of one small "unit" of code and ignore (or stub pieces of) the rest of the universe. On the other hand, we may write end-to-end tests where we try to run test cases through the entire system, making far fewer (but still some) assumptions.

Different levels of testing have their trade-offs #

Generally, testing smaller parts of the system (with more of the world abstracted away) is eaiser. You just have to worry about one small piece of code. You probably don't have to run additional services or execute network requests. There is less chance for randomness or race conditions to manifest and cause test flakiness. However, this comes at a cost: with more of the world abstracted away, you don't truly know how well your test represents the real world. Testing larger parts of the system (using integration or end-to-end testing) is harder. For these tests, you may have to run services, seed data, and execute network requests. This can be challenging! But there is a payoff: you are now simulating more of the real world and, therefore, have more confidence that the results represent the system in production.

Lower level tests end up being simpler in what they assert. They depend on fewer other parts of the system and are less brittle. Consider the following test of an adder function:

describe("adder", function () {
	it("adds two numbers", function () {
		expect(adder(1, 2)).toBe(3);
	});
});

This is pretty easy to reason about and probably won't break any time soon.

Testing larger parts of the system often requires setting up a bunch of services prior to executing the tests. The test themselves read like stories. For example, an end-to-end test for a web application might include the following:

Compile the code
Install/run database
Install/run cache service
Seed test data
start backend or API service
Run browser
Execute a user flow (e.g., logging in to the system)

This test could break if any step fails! Additionally, if a test alters data in a database, it could potentially affect other end-to-end tests if you're not careful to either run tests in a specific order or clean up the database after each test.

Systems that are easy for developers to work with are easy to test #

Developer experience and developer productivity have been hot topics as of recent. There are a lot of benefits of focusing on making developers happier and more productive. One benefit that isn't talked about as much is that software with good developer experience is also easier to test.

Let's consider how easy it is for a developer to get up-and-running with your codebase. Do they have to install a bunch of software independently, grab passwords for a bunch of locations, and troubleshoot myriad setup issues? If so, that's an unpleasant experience—and it probably means it's pretty hard to orchestrate end-to-end testing!

Now let's imagine developers could clone down your git repository, run docker-compose up to start supporting services, and they're reading to develop. That's a great developer experience—and it probably means you could run the very same command on a test server and then run an end-to-end test suite on top.

The buggiest parts of software seem to be at system integration points. These are also the most neglected during testing. #

As a web app engineer, I often see applications fail because the user interface code (running on someone's browser) executes an HTTP request to a server or API somewhere, and either the request or response is malformed or otherwise unexpected. Why wasn't this caught during testing? Well, it's possible that performing end-to-end testing between these services was too challenging to set up. Or perhaps the API is owned by another team and it didn't make sense to include in our tests. Alternatively, end-to-end testing does exist but only tests a limited number of use cases.

It can be hard to know what to test, but based on my experience it's a good investment to have automated tests that make sure different systems interact with each other properly. While I haven't used it myself, pact testing seems pretty interesting for independent services that are difficult to test end-to-end.

Recommendations #

Here are some recommendations based on my experiences. Again, keep in mind that I'm a web application engineer so your mileage may vary if you work in another domain.

Write automated tests #

I spent the beginning of my career working on software that only had manual testing. Relying on manual testing is pretty gnarly. Whenever you make changes to the software you have to make a decision: do we just test the new functionality or do we regression test other functionality as well to make sure nothing broke? Regression testing is safer but time-consuming and, therefore, expensive. Also, humans are error-prone. We're really great at a lot of things, but executing repetitive tasks without making mistakes isn't one of them.

I was in awe when I discovered automated testing. I no longer had to choose between quick and thorough testing! I could write a test that lived in my codebase and could be run at any point with ease. The tooling has advanced to the point where you can re-run your automated tests when you save your code. You can require that full automated regression tests pass prior to merging any code into your main development branch.

Of course, it's impossible to categorically recommend automated testing. If you're building a toy app or an app that simply isn't very important, you can probably forego automated testing.

If possible, don't block deployments waiting for manual testing #

I am a big proponent of shipping code to production as frequently as possible. However, manual testing:

is time-consuming
often blocks deployments

When I say "blocks" deployments, I mean testing has to be completed prior to shipping the code to production. If bugs are considered critical enough, the code may not be allowed to go to production at all until those critical bugs are fixed.

If you must perform manual testing, consider decoupling that testing from deployment. Write good automated tests, block deployments on any failures in those automated tests, and deploy with confidence. Any additional bugs found during manual testing can be implemented as a "fast follow."

For some, this sounds too risky. "I thought we were trying to prevent any bugs from making it to production!" Well, no. We're trying to reduce the quantity and severity of bugs, but any testing we do comes at a cost. In my experience, the cost of hampering the ability to ship code quickly has not been worth it when it comes to manual testing.

It's important to always view testing (and other quality measures) as an exercise in risk mitigation (not elimintation) and trade-offs.

You may have noticed that I caveated this recommendation with "if possible." I mostly work on projects that can be continuously (or continually) deployed. Therefore, it's okay to take some calculated risks with testing because you can always push out fixes to production bugs relatively quickly. However, some folks are not so lucky! Let's consider embedded systems engineers: it may be impossible (or at least very difficult) to implement changes once a device is produced and shipped. In those cases, it might make sense that the development and testing lifecycles looks quite different than those for web applications. Context is always important.

Work on developer experience #

If your codebase is easier to set up for developers, it will be easier to set up for tests. If you use modern tooling and test frameworks that make test-writing easier, your developers will be more willing and able to write tests.

Working on developer experience is a great idea in general and often has hidden benefits for testing your application.

Use test coverage thresholds but don't lose sight of their purpose #

I find software development teams can easily fall out of the habit of writing tests. Implementing test coverage thresholds has proven to be a good way to gently remind developers that tests are needed. The way a test coverage threshold usually works is that a coverage report is generated when automated tests are run in continuous integration. If the required test coverage threshold isn't met (let's say 80% lines covered), then the associated PR can't be merged.

Requiring that a certain percentage of code is covered by tests can be unpopular. "Test coverage does not necessarily mean the code works or the tests are high quality." Very true! But I think coverage is a pretty good proxy for these things and helps reinforce a test-oriented team mindset.

Use test coverage thresholds to systematically fix poor test coverage #

Test coverage thresholds can be helpful for software with poor coverage. Let's say you're at 20% coverage right now. Not great! But you can usually fix this systematically.

First, set a required test coverage threshold to 20%. You can likely make sure your source control product disallows code to be merged to your main branch if coverage will drop below this threshold. Once this is configured, you won't drop below this percentage. When you have some spare cycles, write some new tests for uncovered parts of your codebase. Make sure they are useful and not only an attempt to get the coverage percentage up! Let's say this gets you up to 23% coverage. Great! Now set your new test coverage threshold to 23%. Repeat this process until you max out at a reasonable number (e.g., 80% or 90%).

Write end-to-end tests #

Software I have worked on that performed well had end-to-end tests. Conversely, software that broke a lot in production did not have end-to-end tests. I am only one person with a limited number of experiences; however, I am a huge believer in the quality wins gained by investing in end-to-end testing. Invest time in writing these kinds of tests and it will pay dividends.

Block code merges into your main branch with automated testing when it makes sense #

You can make sure your automated tests are always passing by blocking any code that doesn't pass tests from merging to your main branch. This is generally a really good idea, but I have also seen it go wrong. If, for example, your application is maintained in separate code repositories, you should not block merges on tests that rely on code from both repositories.

That's a bit too theoretical, so let's "come up" with an example (read: use the exact example I have encountered in the wild). Your team maintains an API codebase in one repository and a front-end repository in another codebase. You have end-to-end tests that clone down both repositories and run some user scenarios. This is great! But if you block merges to either repository based on these tests, you now have a sitution where engineers making changes to one repository can block engineers in the other repository. Instead, it might be a better idea in this scenario to run our end-to-end test suite on the main branch periodicially (e.g., daily).

Assert behavior, not implementation details #

Nothing is more frustrating when working on a web app than when you change a CSS class and wind up with a bunch of failing tests. You dig into the test suite and find out that tests are asserting that a CSS class is present on a button on the page.

The problem with asserting a CSS class is that users really don't care what a button's CSS class is. It's purely an implementation detail and has nothing to do with the functionality of the system. Instead, we should be asserting that clicking the button executes a function or action as expected. That action is what the user is expecting. If we assert behavior, we're much less likely to break tests by changing implementation details.

Concluding thoughts #

I hope my observations and recommendations prove useful to you. I don't claim to have any perfect answers on testing and I don't think perfect answers exist. Often, the best course of action is to find out the level of testing that makes your developers happy and productive, minimizes bugs in production to a reasonable level, and requires an acceptable level of maintenance.

If you enjoyed this article, consider subscribing on Feedly or your favorite RSS consumer. If you'd like to chat, I'm most active on Bluesky.

Next: Why I'm skeptical of low-code