6 min read

Something to See

Something to See

Big news this week! The software development simulation I’ve been working on for months finally has a web-based visualization layer. Now that there is a full stack version, this week’s newsletter is all about the simulation: an explanation of what the simulation is; a peek at the user interface; and a call for volunteers willing to see a demo and offer opinions.

Simulating Flow

Let’s take a step back. For the last few months I have been building out a simulation of flow in software development. I’ve written about challenges of designing the simulation in previous newsletters. I’ve talked about the non-linearity of software development. And I published a video of an early prototype. But I have not really explained what the simulation is.

The heart of the simulation is a stocks-and-flow model of software development. Backlogs represent stocks of work. That work flows through individuals and teams. The simulation supports metrics like cycle time and includes software development-specific concepts like bug discovery and deployment.

My hope is that the simulation will provide insight into factors that influence software delivery. What has a bigger impact on organizational performance: team size? team structure? release size? feedback latency? deployment frequency?

I also hope the simulation will provide a way to experiment with diagnosing problems in a system. How do you identify a bottleneck? What do you do about a single point of failure? How do you fix quality problems while continuing to deliver new capabilities?

The addition of a web-based visualization layer is quite exciting. Previously the only way to run the simulation was programmatically. All the output was raw text. Now it’s possible to see charts with stats and trends updated in real time as the simulation runs.

Visualizing Delays

In “Competing Priorities,” I described a simulation scenario that involved a tradeoff between feature work and support requests. My (incredibly obvious) conclusion from running that scenario was: “You can optimize for turnaround time on support tickets, or features, but not both.”

Now that there is a graphical visualization on top of the simulation I decided to explore that same tradeoff again. What patterns might I see in real time data streaming from the simulation as it ran?

The setup still involves a team of programmers, a product manager, and a support manager. Their goal is still to ship a release while being responsive to support requests. This time, I added bugs to make things just a little more complicated.

Let’s examine the wait time from when the support manager makes a request and a programmer picks it up. Here’s an example chart:

Chart of average wait time between making a support request and someone picking it up.

I experimented with the scenario setup and found that tweaking parameters such as number of support requests per week, priority of the support requests, number of programmers, and size of story changes how long support has to wait for a programmer to respond to a request for help.

Consider the chart below. In this simulation run, the stories in the backlog range from 40-80 hours of effort (~1-2 work weeks).  

Latency graph from a simulation run with larger stories in the backlog.

You can see that once activity picks up in the simulation it takes an average of over 90 elapsed hours before anyone picks up a request. (That's elapsed clock time, not just work hours.) Although the average comes down as the simulation continues, the wait time never gets close to the service level agreement (SLA) that our fictional support manager requested.

Here’s the wild thing. When I experimented with changing the various tunable parameters, I observed that there was just one parameter that had an outsized impact on the latency: story size. You can tweak all the other parameters and see small changes. But when you lower the story size from 40-80 hours to 4-8 hours, the latency between a support request coming in and a programmer picking it up lowers drastically. Here’s what the chart looked like with smaller stories.

Latency graph from a simulation run with smaller stories.

You might be thinking that’s all fine and well for a simulation but would this work in the real world? If your organization were struggling to be responsive enough to unplanned work, would reducing story size fix the problem?

The simulation is a model of software development. As George Box famously said:

“All models are wrong, but some are useful.”  – George Box

That said: I do think this is a useful insight. Granted, finding the right lines along which to split stories is hard. But there are a variety of approaches you can take. And splitting stories is much less disruptive than other changes you might attempt. So if reducing story size could improve responsiveness without any other more drastic steps (like a reorganization), it’s worth trying.

Support latency is one of three new graphs. Another shows a cumulative count of bugs found v. fixed, like the example below.

Chart of bugs found v. fixed

That gap between the red and green lines is what happens when fixing bugs is a lower priority than implementing new features.

I confess I had flashbacks to the late 1990s when I produced many a chart that looked like that on real world projects. Actually the chart above tells a happier story than most projects I was on in the 1990s: I rarely saw the red and green lines meet. Fortunately we have come so far as an industry that no organization would ever wait until the end of a release to start fixing bugs. Right? Right??

Technology Stack

The graphs above look simple, but the engine behind them has been a lot of work (and is where I have spent most of my time).

The simulation engine itself is implemented as a Ruby gem. It uses Ruby’s built in Observable module to synchronize all the actors in the simulation to a central clock. The AASM gem provides state management for workers (idle, pulling work from a backlog, executing on work, delivering work, etc.). The entire engine is about 1100 lines of tight, expressive code. The tests are another 2400 lines of code.

The new web interface is a Rails app that uses Chart.js for visualization and Action Cable for real time graph updates over WebSockets. The web app is still very much a prototype and only works on my local environment. Next up: making the prototype more robust so I can make it publicly available.

Huge gratitude to Davis Frank who has paired with me frequently and had a major influence on the design of the entire system, including making the engine a gem. Gratitude also to Matt Wynne for spending time pairing with me, and in particular for his influence on the simulation API and the tests.

Want a Demo?

Now that there is a full stack prototype, it’s time to figure out the future of the simulation. Is it the core of a learning game of some kind? A tool to help technology leaders explore tradeoffs for their own context? Something else?

It also needs a name. I can’t keep calling it “the simulation” forever.

So I need your help. Would you be willing to see a live demo of the simulation and tell me what you think? (And maybe suggest a name?)

Note that this is an exploratory conversation and most definitely NOT a sales call. There is nothing to buy. Although part of my goal with these demos is to gain insight into what aspects of the simulation could be a valuable product, I have nothing to sell you.

If you’re up for a demo, you can sign up for a time slot on my calendar.

I hope to have news in the coming weeks about making a variation of this scenario publicly available for you to play with, but it will take time to make the fragile prototype into something robust enough to deploy for public access. In the meantime I do hope you will sign up for a demo. I would love to hear your opinions!

Stay Curious,

Elisabeth