Software Engineering Stories

2018

Software Disasters

Tom Van Vleck

Think of a past disaster you've been a part of, a project that failed. Can we learn from it?

We used to call these events "tanker collisions." The idea was, they were slow motion disasters; everyone could see that something terrible was inevitable, but it was too late to do anything.

Ask yourself: was it the people, were they too dumb? Usually the answer is no, they were fine people, as good as you can hire. Maybe they weren't all geniuses, but they should have been good enough.

How about the tools: did they cause the failure? Lots of people complain about their tools. But we've seen groups with really fancy tools fail to produce, and other projects succeed with very imperfect tools. And "it's a poor workman who blames his tools."

Was it management? Yeah! Ask anybody, and they'll tell you it was management's fault. "Management blew it. The project was in the weeds and management was counting paperclips. They didn't act in time. They flew the plane right into the mountain."

It seems to be very hard to think about management problems. Often, when we decide something is a management problem, that's shorthand for "unsolvable, not gonna go there." As soon as the trail leads into that thicket, we abandon it and look elsewhere for ways to make things better.

When I look back at failed projects I know about, many seem to have had major management problems. But when I look at future plans, we seem to spend our planning time on technical issues. We don't anticipate management problems or do anything to prevent them, no matter how often we've had them in the past.

[We have names for a few kinds of management problems, but we have no taxonomy or principle of enumeration. That is, we don't know how many ways management could go wrong, and if there is a management problem, everybody will have a different name for it.]

Each new project sets out with the basic plan of doing new things, using new tools, and managing things in the same way that didn't work last time. If management is the cause of many of our problems, can we talk about changing how we manage?

What Won't Work

We could start by listing some approaches that won't work, and giving them entertaining names and descriptions.

Cuisinart Management: I love metrics, when I can use them to convince people to do the right thing. At the same time, I worry that metrics may become a goal in themselves, that we may spend time getting good numbers instead of getting good quality. The basic idea in measuring a process is that one can add data about two different events together. But every bug is different, every line of code unique. We don't order software by the cubic yard. And mincing all the programs, or bugs, or tests, or whatever up in a grinder and then counting the semicolons, or basic blocks, or paths, can lose sight of the code, and the way it runs, and the way bugs get into the code.

Dumbo Management: Suppose the Circus Engineering Institute does a study and determines that all the elephants that can fly are holding little feathers. Then it proposes to give all the big elephants feathers too, so they'll be able to fly. This is the problem with process evaluations. A good organization will (often) get a good assessment score. Often it is possible to change a terrible organization to get a better score without really improving the quality of its output. Some organizations with organized processes can produce good products. The inference that the good product is caused by the organized process needs support, in the form of an explanation of how particular good or bad features are caused. (Other organizations have many rules and procedures, and still fail to produce good products.) Remember my story of André Bensoussan, who wrote perfect code in pencil? Don't buy everybody a pencil and expect perfect code.

Cults, Fads, and Gadgets: Sometimes an organization will mandate a new tool, hoping that this will produce better products. Or an executive will read an article about "agility" or "artificial intelligence" and decide that the whole organization should follow a new path. Some caution is advisable. Management may focus on neatness, on "doing everything the same way," rather than on quality. I have worked on projects where the development progress recording tools were so slow and hard to use that product productivity was trashed.

Throw the Management Out: After a disaster, sometimes even part way through one, it's common to replace the management, and permute the organization chart. The troops know that this rarely helps. Why should we expect the new managers, using the same old process, to work any better? Change alone may get people interested in new approaches to the problem for a while, but there are other effects of opposite sign, such as the cost to educate newcomers. It's like throwing out your pencil when you make a spelling error.

What Might Work

- Incremental process improvement - Encouraged from the top - Driven by the team

Read A rational design process: how and why to fake it, David L Parnas and Paul Clements (IEEE TOSE, Feb 1986).

Published in The RISKS Digest Volume 30 Issue 93, (Saturday 1 December 2018)