TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Self-documenting code

I read Uncle Bob's Clean Code book last year, and one of its examples in the final summary chapter on the importance of naming stuck with me. I mean, most people will just agree that naming is important, but sometimes convincing is needed, and the examples given are either trivial (i.e there's really not a naming problem) or convoluted (you'd never write code that way to being with, even with good names). Bob's example looked like it was going to be the second kind. Have a look:


public int x() {
int q = 0;
int z = 0;
for (int kk = 0; kk < 10; kk++) {
if (l[z] == 10) {
q += 10 + (l[z+1] + l[z+2]);
z += 1;
} else if (l[z] + l[z+1] == 10) {
q += 10 + l[z+2];
z += 2;
} else {
q += l[z] + l[z+1];
z += 2;
}
}
return q;
}


A loop with a few if-elses in it. A reference to some higher-scoped array l. Some magic numbers. What does it mean? Would anyone ever write this? I can tell you what this code does on the machine, I can't tell you what it "does". Its purpose. Its meaning.

Put names in, and bam, now you know the purpose.


public int score() {
int score = 0;
int frame = 0;
for (int frameNumber = 0; frameNumber < 10; frameNumber++) {
if (rolls[frame] == 10) {
score += 10 + (rolls[frame+1] + rolls[frame+2]);
frame += 1;
} else if (rolls[frame] + rolls[frame+1] == 10) {
score += 10 + rolls[frame+2];
frame += 2;
} else {
score += rolls[frame] + rolls[frame+1];
frame += 2;
}
}
return score;
}


With just a few name changes, you now know the purpose, at least if you're vaguely familiar with bowling. Could code be written like this in the wild? Absolutely. You can of course improve it even further by making a few one-liner functions:


public int score() {
int score = 0;
int frame = 0;
for (int frameNumber = 0; frameNumber < 10; frameNumber++) {
if (isStrike(frame)) {
score += 10 + nextTwoBallsForStrike(frame);
frame += 1;
} else if (rolls[frame] + rolls[frame+1] == 10) {
score += 10 + nextBallForSpare(frame);
frame += 2;
} else {
score += twoBallsInFrame(frame);
frame += 2;
}
}
return score;
}

private boolean isStrike(int frame) {
return rolls[frame] == 10;
}

private int nextTwoBallsForStrike(int frame) {
return rolls[frame+1] + rolls[frame+2];
}

...


And you could improve it even further (frame vs frameNumber is a bit odd) in a lot of ways now that you know what it's supposed to do, and you can make deeper changes like the data structures and so on. Understanding beats tests.

Perhaps you aren't familiar with bowling, though, so an explanation would be helpful. And it'd be helpful to understand why a simple array for rolls[] was made. Should all this extra info be in the function though? No, arguably it should be somewhere else. Perhaps in the class doc. But we might also want to know the context this function is called, not just what its purpose is for, and that would again be somewhere else.

Go too far down this rabbit hole, and you'll end up with literate programming.

I want to write some uninformed thoughts on literate programming...

Tim Daly did a talk on literate programming here.



A TLDW version can be found in this HN comment. Specifically it relates to the question in the talk, "This is all fine and good but some of us work at BigCo, what can be done?"

His suggestion is to take every team and give them an "Editor-in-Chief". That person's responsible for making sure no code is checked in without at least a paragraph somewhere explaining why that code as checked in.

The benefits from this (and literate programming) are that when the team dies, another team can come in, spend a couple weeks, and understand everything well enough.

My thought though was that.. BigCos kind of already do this, but because of scale, there are some differences. It's useful to think about the diff in practices vs. what literate programming would bring.

If you've just joined a team responsible for a subsystem, and then the rest of the team dies, you're now on the hook for adding new features and fixing bugs of the existing subsystem. What resources do you usually have available to you, in a modern BigCo software company (and many other types of companies)?

1. The code itself.
2. The version control history of the code.
3. The work tracker.
4. Code review history.

I argue that combined these four things make up the lion's share of what literate programming offers. There are improvements literate programming could bring, for sure, but those would come at a cost. What we have now seems to be a local maximum. Let's go through the areas.

1. The code itself



There's a lot of variability here. You might have code like the obfuscated bowling score function I started this blog with, and you'll need to reverse engineer it. You might have nicer "clean code" in the style of Uncle Bob, but even that isn't always straightforward to understand. In reality you probably have a mix of code that's hard to understand its real purpose and intent, the why, and some good oases you'll find that you can readily understand. If you're not suffering with fear-driven development, you can make some refactors to help your understanding, even if you never check those in.

The code is what ultimately matters to the business, it's what gets executed by the machine to accomplish its tasks. The ability of humans to understand and change the code is only related to the business objectives of efficiency. Fortunately, the code itself is usually amenable to understanding. We all work in high level languages almost all the time, structured programs are a given. You may be more or less fortunate to have some sort of consistent style.

It's possible to take a bunch of code with the average amount of clear names and comments and understand it very well. You don't need a book, you don't need clean code. Do those things help? Absolutely. But if you don't have them, most of the time you can still make progress with minimal understanding. Humans are intelligent creatures.

2. The version control history



Sometimes the "why" of the code actually is available, it's just that the location happens to be in the commit message that introduced or changed the code, instead of in the code itself as comments or in a literate program.

Having the history also can help you see how the system evolved over time. It also opens the door to the strategy of: the current system is too hard to understand, is the system 5 years ago any easier to understand? Let's look at that one to begin with.

With better source control systems, and smaller commits, you can get pretty good line-by-line annotations of at least the time something was changed. And in modern companies, even if these commit messages don't really contain the why, they will explicitly point to (or implicitly point to by virtue of them being associated with the commit id) the next two things.

3. The work tracker



Modern companies require commits to refer to an item in the work tracker. (Caveat: various git workflows will instead require a set of related commits to fulfill that role.) This is overhead, but the benefit is that you have yet another thing linking the code to something else. What's in a typical work tracker?

Work items are typically divided into two classes: bugs and new features. Some systems call the latter "user stories". Sounds familiar to the idea of literate programming, right? Well not really, but it accomplishes some of the same goals.

User stories



Feature work is supposed to be done in "stories". Classically a story is always supposed to be titled like "As a [persona], I want [x] so that [business value]". Personas might be your end users in different contexts, other subsystems, or other programmers/teams. In any case, the body of the story would go into further details. Typically the minimum amount of detail will include a high level "acceptance criteria" or "definition of done", a bullet list of things you can check to verify that the goal of the story was implemented or not. There may be much more than the minimum, such as commentary about specifics that only programmers care about, links to design documents, a test plan, related stories (an "epic" might summarize all of them), useful people/teams to talk to, and so on. How much more than the minimum (and who is responsible for authoring it -- the programmers themselves, or the product manager, or a combination, or some new role of 'editor in chief') boils down to team culture and the scope of the story itself. (As an example, my current team doesn't often title our stories the official way, but they do all have done-whens.)

Bugs



Bugs are different in that they don't have to follow a "story", but a good bug report will generally follow a pattern of including at minimum: a description of the problem, a set of instructions to reproduce the problem, what and when the problem was observed in the instructions, and what was expected to happen instead. The quality of the bug report will depend on the quality of these minimum components. A repro that doesn't work, or is too vague, isn't helpful. Screenshots or a video or a stacktrace might be very helpful and can sometimes suffice even when the rest is absent, though it also might just be noise. The bugs that aren't fixed in a timely manner then get an additional piece of information on them reflecting the priority of fixing them later. Hopefully with some explanation of what if any blockers there are (perhaps there's uncertainty about the best fix).

Anyway, I hope I've demonstrated that with work items, which are related to checkins, which are related to every chunk of code, we have yet another source of "why" and "what purpose" to consult that can help you even if you're given code in the form of the x() function above.

4. Code reviews



Modern companies require every non-trivial commit to pass a code review with at least some other developer. Maybe Daly has a point that some editor-in-chief / doc writer could also have a say in this process, though a more scalable solution is for teams to police themselves.

Code reviews sometimes contain the best information. Unfortunately it's not always the easiest to find them, but if you do, it can be illuminating. Discussions about design details can sometimes only be found in a code review. It's yet another source to consult when you don't understand the code's purpose.




What is missing from the combination of these 4 things vs doing things a literate programming way? I think there's just one big one. Literate programming gives you a book-like artifact at the end to describe the whole system. The way modern companies work, understanding the whole system is taken to be impossible, and so they optimize for understanding "just-in-time" by a series of locally costly lookups to these 4 sources instead like in a literate program just going back to the chapter of interest to review.

Would having a book of the system be valuable? Modern companies have decided no. Perhaps wrongly. But global explanations are still sometimes handy, so we find them in non-book forms. You'll typically discover an ad hoc collection of block diagrams and slide shows and wiki pages and readmes and recorded talks and other documentation. The problem of course is that it can quickly go out of date (and you might not know it's out of date) because it's not tied to the code or anything. Still, companies get by. Even companies that suck by not even using version control, somehow seem to live for a while. Surely literate programming (and many other things) would be a big improvement for them, and maybe it makes the difference of a company lifespan of 7 years vs 20.

Ultimately this is why I think literate programming isn't going to catch on, at least any time soon. Ignoring the individual level (programmers want to write programs, not books), there are these communal level issues and despite programmers also not wanting to write documentation or make a work item for each commit companies have decided to force various things as a tradeoff. The set of tradeoffs currently made seem to be working well, it won't be until either we get competitive benefits of more literate programs or a collective fashion among programmers that we'll see the practice take off. I suspect that literate programming being driven by developers rather than management is more likely, since open source development styles/fashions are more and more influential over proprietary development and literate programming could become popular for open source.

In the meantime, perhaps it's a secret weapon for a startup, or a way to make sure your open source project is long-lived. Literate programming advocates insist it lets them go faster despite the overhead, more people just need to try it and see. Maybe we need better tooling, I for one am not going to learn emacs.

Some day though I'll try a more literate style, maybe write my own simple tooling, but probably not in a full fledged way until I get around to reading the PBR book. For now I get by too.


Posted on 2019-02-09 by Jach

Tags: programming, quality

Permalink: https://www.thejach.com/view/id/357

Trackback URL: https://www.thejach.com/view/2019/2/self-documenting_code

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.