When I explain to folks why I want to create a simulation about software development flow, I find myself repeating a phrase: software development is non-linear. In this week’s newsletter I wanted to explore that non-linearity in more detail.
First let’s step back. What do I mean by “software development is non-linear?” Unlike in physical systems where effect directly follows cause, in software development cause and effect often seem to be mere acquaintances. Or, as Wikipedia notes, a non-linear system is one where “the change of the output is not proportional to the change of the input.”
Consider a late-breaking change that went into a consumer desktop application. The software was written in an object-oriented language (C++ if I remember correctly, but the specific OO language doesn’t really matter for the purposes of this story).
A developer had found a way to fix a deep, gnarly bug by changing the default value of an instance variable. It was a single character change. We didn’t have automated tests and the programmer was cautious, so he did a series of paired code reviews with a selection of the most senior programmers and testers. No one he consulted (including me) could find anything wrong with his change. He merged it and delivered a new build to QA for testing, and…
Oh, the package still installed and launched, so some things still worked. But almost any user action triggered a crash. Turns out the code change was in a base class. Thanks to inheritance, that one-character fix—the teensiest possible change—had a ripple effect throughout the entire system.
I still break out in a cold sweat remembering that project. We were so incredibly lucky that the problems were spectacularly obvious. We were at the very end of an intense release crunch. Everyone was running on too little sleep and too much stress. If the side effects of the change had been subtle, we probably would have missed them. Instead we saw the problems immediately. Within an hour of delivering the build to QA, the programmer had reverted the change. If something is going to fail, I want it to fail BIG.
There are any number of other types of changes that seem like they should be innocuous but that can have an outsized impact: updating a dependency (even when theoretically it’s just a minor bug-fix version bump); NOT updating a dependency when there's a security patch available; changing a shared UI component; tweaking a permission model. Seemingly tiny changes or decisions can cause a cascading series of consequences.
Similarly, changes in project context can have a surprising impact on delivery schedules. As Fred Brooks noted back in 1975, adding more people to a late project makes it later. Yet despite Brooks' Law being close to half a century old, organizations still attempt to speed delivery of a release by scaling up the size of the team late in the project.
Sometimes the most profound change in context is a change in our understanding. Perhaps it’s our understanding of the full scope of the problem space or of the implications of previous design decisions. Or, as in the case of the story below, perhaps what changes is our understanding of our own technology stack.
I was the product owner of an internal tooling project. I’d asked for a particular feature. “Oh, that’s really hard,” the programmers demurred. “Are you sure you really need it?”
Yes, I confirmed. I really need it. But it can wait until later, I conceded. Week after week as we groomed the backlog, I moved the item further down the list. Every time it bubbled toward the top, the programmers would shake their heads. “Yeah that one,” they’d say. “That one is going to be hard.”
Then on one magical Friday as the clock ticked inexorably toward the weekend, I looked up to find two programmers looking at me sheepishly. “Yes?” I asked.
“Well, we ran out of other work in the backlog. We considered doing some code cleanup but we decided to take a run at that one story that we keep putting off. And we realized something...”
“...and that is?”
“The framework we’re using gives us that capability basically for free. We just had to wire it up.”
They were embarrassed by how hard they’d pushed back on me about the feature. They were also concerned I would think they’d been sandbagging. I wasn’t at all upset with them for overestimating the amount of work needed. Their resistance forced me to be more thoughtful in my prioritization. That in turn made for a more valuable product. I made sure to tell them that, along with expressing my utter delight at finally having the feature. I started my acceptance testing then and there. After weeks of believing the feature was so difficult I might never get it, it was coded and accepted and in production all within the span of an afternoon.
This particular story had a happy ending. Many other stories go the other way: a seemingly trivial change takes days or weeks. I recall one case where fixing problems caused by quotes in strings was estimated to be a couple days of work and a month later the team was still struggling to find all the various sources of strings and ways quotes could sneak into them.
Although some might believe the problem is shoddy estimation practices, I have a different take: it turns out that the non-linearity of software means that effort and outcome are orthogonal concerns.
Some organizations attempt to force predictability by demanding schedule commitments and crunch time. Sometimes that works. Sometimes managing to a schedule provides clarity and focus.
More often in my experience, schedule pressure does more harm than good. Unfortunately, the harm is not immediately visible. It’s a classic despite / because situation. Pressure is applied. The project ships. Things seem to be OK. The leaders who pushed for commitments and crunch time get what they asked for. So even if teams complain about deadlines, the leaders come to believe the project did as well as it did because of their insistence on adhering to a schedule. They don't have to face the possibility that the teams delivered despite, not because of, arbitrary deadlines. They also don't see the shortcuts the programmers took that make the next set of features harder to add.
Code versus behavior. Effort versus outcome. Despite versus because. Software development is inherently non-linear and difficult to reason about. Developing an intuition for tradeoffs and predicting outcomes is the work of a lifetime. I wrote my first line of code over four decades ago and I’m still learning. I expect I'll still be learning in another 20 years.
There are some things I've learned that help reduce the non-linearity of software development. Disciplined engineering practices, in particular, are critically important:
- Make one change at a time
- Keep everything versioned together in a proper source control system
- Treat configuration as code
- Automate all regression tests
- Explore to discover unintended consequences
- Deliver incrementally and frequently
Such discipline is hard. It requires both the unwavering support of leaders all the way to the top, and buy-in at a grass roots level. Top-down support is necessary because leaders are in a unique position to foster a culture that values good engineering. Grass-roots support is necessary to counteract those who chafe at what they perceive as unnecessary overhead.
Ultimately this is why I am building this simulation. It can't possibly model all the complexities of the real world. But perhaps, just perhaps, a carefully designed simulation can help folks develop an intuition around how to reduce non-linearity. Maybe it could even dispel the most common myths (like the myth of the hero – a subject I'll tackle in a future newsletter).
Although I still don't have anything new to show you, progress continues. This week I'm focusing on using the core engine to model a variety of situations. I hope to write more about that next week.