Software Engineering Stories

1994-11-11

A Few Minutes' Discussion

Tom Van Vleck

(Imagine here the Gary Larson cartoon titled "Well, shoot. I just can't figure it out. I'm movin' over 500 doughnuts a day, but I'm still just barely squeakin' by." showing the doughnut shop owner talking to an enormously fat helper.)

A product I worked on went through a painful period after its first release. We kept finding serious bugs that caused crashes and data corruption. After a long period of fire-fighting and fixing problems one by one, we realized that there were serious underlying design problems: critical tables were not protected by locks; important functions only worked accidentally; and we had timing dependencies and race problems. The company had to rewrite the software to solve the problems. It probably cost us several million dollars to support a buggy product in the field, to try fixing the product in piecemeal fashion, to divert resources from other improvements to rewriting this code, and to overcome the customers' bad opinions.

We could have saved those millions of dollars if the QA person for the product had had a certain conversation with the developer, back in the design stage before any code was written. The QA person would have said, "You know how I think I'll test this stuff? Imagine a big matrix, with the valid states of an object labeling the columns, and all the possible events labeling the rows. I'll systematically put objects in each state and hit them with every possible event. Then I'll fill the cells of the table with the expected resulting actions and states, and see if the software really does what it should."

The designer would have pointed out that not all events are possible in all states. That would have led to a classification effort and several discoveries right away. The need for locking and the possibility of events happening in more than one order would have been much easier to talk about with this framework in place. The most important result would be that the developer would plan to pass all the threatened tests, and in the process would build much more robust code.

All these benefits would have come about from a few minutes' discussion. A conversation that appeared casual.

Influencing others; changing the development process

Instead of a conversation, would a memo, or an IEEE test plan, have worked? In some situations, maybe. In other situations, avoiding the conversation might be seen as hostile, as contributing to an "us and them" feeling. In the kind of groups I like best, every member respects the feelings of his or her colleagues, and brings out the best in them. If a standard test plan is mandated by the group's development process, the whole team takes responsibility for making the test plan be as good as possible; indeed the whole team takes responsibility for making the development process as effective as possible, and changes it as necessary.

State space analysis

The state/event matrix is a good trick. Sometimes enumerating the states is the real hard analytical problem: there may be two or more processes involved, each with a state, and the state space is then the product space. The event space may be the set of possible sequences of sub-events. Working out the chart can become a real headache. But if the software is actually behaving in this complex way, not analyzing it will have even worse consequences.

Providing a descriptive framework

The same program behavior can seem baffling or obvious, depending on how we think about the program. If we have adequate descriptive terms, and abstractions of the right levels, we can describe a complex system's behavior as the result of the composition of simple subsystems.

updated 11/11/94