Multics > People > Stories
14 Mar 2001

The Magic Finger

Olin Sibert

Early in Multics' tenure at the University of Calgary, the University was very unhappy about system performance, and in addition to demanding extensive support from Honeywell, they also commissioned an "independent review team" to come take a look and make recommendations.

This was a big deal for me, as it was my very first consulting engagement. I believe it was in late 1979, and I'm embarrassed to say that I don't remember the other team members. I vaguely remember Bob Mullen being in the crew, but as a Honeywell employee, he would have been an odd choice for an independent team—but I know he did a bunch of work there later. Anyway, I do remember the trip: we spent a lot of time in the Toronto airport, as Air Canada needed three tries before they found an L-1011 that was actually capable of getting us to Calgary—they let us get on each one just to check, but the result was that we finally arrived at 3:00 AM, waited an hour for the luggage doors to thaw, then got to our hotel with just an hour or so to nap before the 7:00 AM breakfast meeting with the University administration (president, provost, I don't remember, but it was clear that this was a Big Deal for them).

They gave each of us marching orders, and as the others went off to conduct interviews about the University's use of Multics and relationships with Honeywell, I was tasked with looking at the technical situation. I spent a few minutes looking at metering output, and the answer was obvious: the system had far too little memory. The configuration was something like 2 CPUs with 512K words, well under the usual "1 CPU, 1 megaword" recommendation. Of course, the University's own staff had also figured this out long before, and had done an excellent job of tuning to make the best of the situation, but they couldn't just waltz off to Sears and buy another million words, hence the outside review.

So, having learned all this, and realizing that there wasn't any magic wand I could wave any more than the computing staff could, I asked what else I could help them with. We talked about various software topics, but then the system crashed (there had been a lot of that lately) and we all rushed to the machine room to see what was happening. Not that we thought there was much we could do, of course—it just seemed to be important to be in on the action.

However, when we got there, the operators hadn't yet formally taken down the system, because it was hung, unresponsive, rather than actually crashed, and they were still thinking about what to do. While the staff was consulting about this, I took a look around and noticed that although CPU A was definitely hung (lights lit, but nothing changing), CPU B was still cheerfully running the idle loop. This seemed odd, because it looked exactly like what would have happened if someone had put CPU A into "instruction step" mode to halt it temporarily. So, on a lark, and because I was sure it couldn't make matters worse, I walked back to CPU A and pressed the "step" button. Presto: the system magically came back to life, immediately spitting out a bunch of error messages about timeouts that had occurred during its 10-minute coma, and resumed normal operation.

I was astonished, and the staff was thrilled—nobody liked having 2-hour non-ESD crashes in the middle of the day, and that's what had been happening. I explained that it was just a fluke, and couldn't be expected to work again, but then the same problem occurred again in the afternoon, and the magic finger fixed it that time, too. There was a CPU hardware problem, and it did get fixed a few days later, but I'm told that in the interim, they stationed a student intern at the CPU who was told to watch it like a hawk, and press that button if the lights ever stopped blinking for more than a few seconds.

In the evening, the administration got us all back together for a sumptuous dinner high above the city at the Calgary Petroleum Club—of which I remember almost nothing, because I still hadn't gotten any sleep. We made various recommendations, one being "more memory" and another resulting in Bob Mullen's invention of "page pinning" to dramatically improve performance of the login process, and the most important being to keep the Multics system, rather than ripping it out and replacing it with IBM gear. I think Honeywell did make a deal for more memory soon after that, and reaped handsome dividends: the University of Calgary was a prolific contributor to the Multics software culture and environment, as well as creating the ACTC spinoff that provided Multics maintenance through the end of the product's life.