I joined Project MAC as a graduate student in 1964. I completed my masters in 1965 with Jack Dennis. I completed my PhD in 1968, also with Jack as advisor and Bob Fano and Jerry Saltzer as the other two members of my committee and Fernando Corbato was an informal advisor. The comments below reflect memories from my perspective as a student.
I did a lot of FAP systems programming on the IBM 7094 computer. Al Scherr was very helpful to me. The 7094 must have been added to the computation center by the time I got there (1964). CTSS was running on it. They had quite a team of system programmers.
The time sharing concept was well accepted by the time I got to MIT. It was explained to me as a method of multiplexing the CPU so that many users could share the CPU and experience progress in parallel. The sharing also reduced the charges assessed of each user; computing was not free in those days. Users of CTSS had a monthly allocation of CPU time and disk; they could be locked out if they overran their CPU allocation. Batch mode operating systems were found to be more expensive per completed job (including all the debugging) compared to CTSS. We users were very sensitive about the CPU allocations and often wrote our own versions of systems programs to make them faster. A personal example of this was my exasperation with the FORTRAN commercial I/O package, which had been imported into CTSS. This package was several times larger than most user programs -- which was quite annoying because the CTSS scheduler gave priority to small jobs. I wrote a substitute I/O package called MADIO, for users of the MAD language that I used all the time. MADIO was only a few hundred words and when used instead of FORTRAN IO significantly improved response time for its users and reduced charges against precious CPU time. MADIO became very popular in the CTSS community.
Another very popular pair of homegrown programs was TYPESET/RUNOFF; I had the impression that Jerry Saltzer put them together. He used them to format and print his wonderful 1966 thesis about traffic control in Multics. They were a progenitor of word processor programs. For encouraging creativity and innovation, there was nothing like the sharing going on in the CTSS time-sharing community.
The interactivity of time sharing was itself the subject of research. We found that it was much easier to debug programs when we could interactively design and test because we could keep our minds on the task at hand. Some researchers measured the effects and even found that there was an "optimal" time slice that minimized programming errors!
One of the CTSS engineers' most amazing accomplishments was that the entire OS kernel fit into 32K words (128K bytes) of memory. Since that time we learned that with level structure we could build even more compact microkernels. Today's commercial operating systems run upwards of 100 million lines of code. What a contrast! I think that projects with hundreds or thousands of programmers produce more code and function than small project teams. Many of my students today find it hard to believe that what they would consider an OS kernel could be so compact as it was then.
Jack Dennis gave me a copy of the 1961 long range computing report. I read it from cover to cover and was amazed by the ambitions of its authors. The one ambition that stuck in my mind was to acquire a very large core memory -- I think it was to be 1 million words. It was extremely expensive, 1 million 1960 dollars, and was never funded. The long range group envisioned a system with 4 times as much memory as on the CTSS 7094, but the computation center had to wait for the commercial technology to catch up (which was not long).
Jack Dennis also sent me to his PDP-1 lab to learn how to use and program the time-sharing system he had been developing. It was quite good and fast. Someone showed me a 2-D text editor he had designed, and another showed me a space-wars video game he had designed. I was quite amazed at these accomplishments. Jack wanted me to look at the memory management subsystem because swapping was the big bottleneck. That must have been the beginning of my fascination with how computations use memory. (See below.) Jack asked me to investigate Gier Algol, a system built by Peter Naur, and advise him on whether any of its techniques could be used in his time-sharing system. Although I do not remember what I advised, I do remember that Gier Algol had some very clever memory management methods.
Soon after arriving at MIT in 1964 I heard Bob Fano talk about Project MAC. He said that MAC means either "multiple access computer" or "man and machine". The former interpretation appeals to the engineers like Corbato who must build it, and the latter to the visionaries like Licklider who tell us where it is going. Fano used the term "computer utility" to describe the vision that computing power would be dispensed from computation centers via links into homes and offices, in which you could plug a cheap terminal and have access. The term "utility" he said referred to the analogy with the electric utility, where you plug appliances in and draw just enough power to get the job done. The utility would allow computing power to reach many more people at a price they can afford. It would also allow the development of communities whose members design and share software, creating a new wave of productivity and invention, and further bringing down prices as software became more efficient and used less computing power for a given job. Fano had an idea this would work because it already worked in the CTSS community. I thought this was a powerful vision and was moved by it. I joined many other graduate students who found that this vision gave meaning to our graduate work.
Today when I hear all the talk about the "cloud" I look back on the Fano-Licklider vision of the utility. It is really the same thing.
ARPA (now called DARPA) realized early on that computing technology would be a constant source of innovations and technology surprises. In its mission to avoid technology surprise for the DoD, it aggressively sponsored research in computing. Time sharing was of early interest. In the early and mid 1960s DARPA was already exploring networking because of its interest in secure, disruption resistant communications. ARPA was working with Licklider, Roberts, Taylor and others to develop a plan for a data network experiment. One of the objectives of the data network was resource sharing -- for example, a military planner in Washington could access a data service in California. The first plans for an ARPANET were presented publicly by Roberts at the 1967 Symposium on Operating System Principles, which I attended at a graduate student finishing up my PhD. ARPA was also sponsored AI research (McCarthy- Minsky) and massive-parallel high performance computing research (Dennis).
We should remember the military involvement in Project MAC in a wide sense. It included time sharing, computer utility, data networking, and high performance computing.
While I was working on my masters thesis about scheduling of requests for secondary memory in time-sharing systems, my advisor (Jack Dennis) was working with one of his students, Earl Van Horn, to develop a semantic model of operating systems. They were trying to find a simplifying framework for the many complexities that were appearing in the design of Multics. They had invented a generalized virtual address called a capability and showed how to organize an operating system so that users would get services by presenting capabilities for them. Capabilities could be mapped to objects by the same methods as used in the segmented virtual memory. They capability system provided an exquisitely powerful means to assure information protection while allowing easy sharing, and to confine untrusted and service programs to small "spheres of protection".
I remember visiting in Earl's office one day as he was working on the finishing touches for their paper. He was trying to figure out the right word for the protected pointers to objects. On his blackboard he had listed about two dozen possible terms and was asking everyone's opinion which would be the best word. My three favorites where handle, pointer, and capability. Earl finally decided on "capability", which turned out to be a good choice because the term is still used today. Their capability-addressing model became the blueprint for object oriented programming. Their paper became one of the classics in computer systems, influencing many things in machine design, object oriented programming, protection systems, and today's distributed file systems. All that from a quest to tame the complexities of the emerging computer utility.
I mentioned that I developed a fascination with memory management when working on Jack Dennis's PDP-1 time sharing system. Jack had already done a lot in this area. He had taken the original virtual memory idea (from U Manchester 1960) and extended it to segmented memory rather than paged memory. He wanted the memory environment to accommodate to the objects that programmers use; every object can be placed in its own segment, protected by the memory manager, and shared with other users. His proposals for segmentation were adopted into Multics. When he introduced me to this subject, he said his interest was in "programming generality" and "name management". He told me that "memory management" is also crucial, for without it virtual memory systems would be useless if they ran too slow. Early experimental work with virtual memories seemed to suggest that getting it to perform well would be a major problem. Unless virtual memory was widely accepted, few of his methods for programming generality would work.
Jack commissioned me to master the performance side of memory. The first step was to evaluate the effects of scheduling on file memory operations. Drum and disk file stores were the big bottlenecks in time-sharing systems. Can the swap time of a disk or drum be reduced by scheduling waiting requests in good orders, such as shortest seek time first (SSTF)? Through simulation studies I showed SSTF was close to optimal and designed a new policy SCAN for minimizing seek time on disks. My first published paper was at Spring Joint Computer Conference 1967 on this topic.
I moved on to the thornier problem of dynamic memory management of running processes in time-sharing systems. There were many controversies about which page replacement policies gave best performance and which could be used in Multics. Almost nothing was known about replacement policies. Jack Dennis had an intuition that the objective should be to find, load, and protect a program's "working set", which was the smallest subset of a program's pages (or segments) that should be loaded into main memory to sustain good CPU efficiency. However, I soon found that while many of those involved in the controversies agreed with the intuition they disagreed violently on how to detect it or even whether it could be detected. They wanted all memory management to work well through automated detection of working pages rather than through programmers advising the OS. In my research I formulated a precise definition of working set and worked out the mathematics to express the mean working set size and paging rate in terms of the distributions of times between successive references to the same page. I worked with Les Belady at IBM on our mutual discovery of program locality, which showed that measured working sets would accurately detect the favored subsets of an executing program and would be very reliable predictors of main memory demand in the immediate future. The locality principle has since become recognized as one of the fundamental principles of computing and is exploited everywhere to improve performance of systems and networks. The working set and locality theory gave a simple explanation of the then inexplicable phenomenon of thrashing in multiprogrammed virtual memory systems and showed how to build a control system that prevents thrashing. I hypothesized that a working set memory policy could be tuned (by adjusting its one parameter) to achieve an optimal level of system throughput; but it took another ten years of data collection and modeling to firmly prove that.
I presented my first findings on working sets and locality at the 1967 Symposium on Operating Systems Principles, which is now known as SOSP-1. A year later I published my theory of thrashing prevention at the 1968 FJCC. Locality theory has since been well developed and applied in numerous aspects of computing such as cache design, virtual memory management, data communication networks, proxy servers, and (recently) in the very definition of algorithm. All that from a quest to sort out the apparent contradictions in early virtual memory systems and to understand why those systems work well.
09/01/14