Measurable and immeasurable code quality

In software engineering, there is one concept that I find very tricky: “code quality”. There are a few possible ways to find evidence for it and I'm going to outline just two in particular: “factoring” and “unit test coverage”.

Good code is factored such that unrelated (orthogonal) concerns are adequately isolated from one another. By keeping things as decoupled and independent as possible in code, the combinatorial explosion from the many possible states the code can be in can be somewhat tamed.

The hope is that this makes it simpler -- or even possible -- to reason and understand the behavior of the program from its source code, or the extent of the effects a particular change may have.

The opposite of well-factored code is the famous “spaghetti code”.

For poorly factored code, or spaghetti code, any part of the code may jump to another seemingly unrelated part; or, at any point, the code may read and/or alter global state.

In this case, it becomes harder to reason about program behavior, let alone understand the possible “blast radius” of any change you intend to make. To put it simply, you can't reasonably assume or guarantee that you understand all the ways the code relies on its current form to function as it does currently.

Changing gears, another possible piece of evidence for code quality is “unit test coverage”, measured in percentages.

This means that, if you run the whole test suite, that is the percentage of lines from the source code that get executed.

Whether the test suite enforces the actually correct assertions and checks is kind of a different question that “coverage” doesn't -- and can't -- answer.

However, at least with a unit test suite, for errors in the program to exist, they have to be either (i) mistakes in the code AND mistakes in the test suite, or (ii) mistakes in the code AND missing from the test suite.

The (ii) category of errors can be remediated or mitigated through requiring a higher unit test coverage.

While tedious, I agree with the argument that checking code once (during code reviews) by reading it, then twice, by writing tests for it, is superior to the alternative of having no tests at all.

Especially when coupled with diligently adding unit tests for every reported bug, the ability of the test suite to automatically catch regressions can't be understated. It means, in concrete terms, that in the future the code won't at least be wrong in the same way it has been before, which is a sensible directive to follow. (One hopes to run out of ways that one's software can be wrong.)

The opposite of “good unit test coverage” is not having unit tests at all. It means relying on a single or small number of programmers to be proficient and knowledgeable about the codebase and use reason alone to guide any changes. As might be clear from how I'm framing things, I think this is a brittle proposition and unit tests are indeed incredibly helpful as a tool.

Unfortunately, though, one finds out that the levers that are pulled are, most often, about the things that can be measured. It thus follows that someone can enforce, helpfully or not, unit test coverage, but no one can enforce good factoring -- since it can't be measured programmatically or even objectively.