MULTICS TECHNICAL BULLETIN MTB 466 To: MTB Distribution From: Roger Lackey Date: November 5, 1980 Subject: History and Evolution of MRDS The purpose of this MTB is to give the authors perception on the way that Multics Data Base Management evolved. The author worked as an implementer on the project since its inception. Marketing had requested that a data base management system be developed for Multics. First investigations indicated that an Integrated Data Store (IDS) or Network type of data base should be developed, primarily because GCOS had an IDS data based manager. Influenced by the literature available at that time and the fact that there were no compatibility constraints a Relation approach was chosen. To my knowledge no technical design documentation was ever prepared for the product. The general design began with the description of the primary user interfaces and some major include files. There was a functional specification that was written by the developers long after the major implementation was in place. Since little direction was given by marketing and only one of the four implementers working on the project had any prior experience on Multics, many of the design goals were never specified and those that were specified were arrived at with little consideration or investigation. Infact numbers for the maximum number of attributes per relation and maximum relations were chosen by the implementers with no input from real world data base users. The result after the first year of development and implementation was a limited working Relational Data Base Management system. The enhancements of MRDS, from then through MR6.5 were strictly to added relational functionality to the system with the exception of using inverted keys in vfile_ to provide secondary indexing for efficiency. This growth evolved with no documented long range plan to guide its future. ________________________________________________________ Multics Project internal working documentation. Not to be reproduced outside the Multics Project. Page 2 MTB 466 Along with the lack of planning, no acceptance tests or performance tests specification was ever developed. A set of regression tests were evolved in a haphazard fashion by collecting and adding exec_coms that were examples of reported bugs. This by no means tested the functionality or any exception handling of the system. Until 6.0 time frame little concern was given to concurrent access of data bases journalization and use with a transaction processing system. This version of MRDS had a fairly straight forward design concept that may have not been implemented in the best manner but did work fairly well. The design called for one relation per vfile_ file with the data files residing in a special relation directory under the data base directory. The tuple structure was uncomplicated with attributes of the various data types concatenated together into a bit string. This was then prefixed with a string of bits (one for each attribute to indicate existence) into a bit string to form the tuple. The tuple was overlaid with a character string to form the vfile_ recorded for that tuple. The primary keys for each tuple (record) was constructed by concatenating the primary key attributes as bit strings and overlaying them with a character varying string of the appropriate length. The search method employed by that version of MRDS involved requesting a record from vfile_ either sequentially or by key and evaluation done by MRDS with no optimization applied to the selection expression. One of the major problems with this version of MRDS was the inability to use vfile_ key values for range searches. This was discovered late in the implementation and was a result of overlaying the numeric attributes with character strings to generate the key (negative numeric data types caused the upper most bit to be on making negative numbers appear after positive numbers in the collating sequence). Other problems included: The evolution of the search program which grew by kluging new function instead of designed growth. It ultimately grew to 3322 lines of source code for the one program. The parsing of the cmdb source and the creation of the new data bases did not provide the users with much in the way of diagnostics. MTB 466 Page 3 The parsing of the selection expressions was distributed through out many modules and provided almost no diagnostic or exception handling. Concurrent access was added after the major design was completed. It did not provide pleasant recovery from data bases being left open by dead processes. The method used to get data into and out of the data base lacked modularity and has a great deal of overhead. Lack of journalization or recovery software. Conclusions about the 6.5 version of MRDS. It proved that a relational data base could be implemented on Multics. Even though the system lacked any long range planning or design the concept was very good but the implementation was very poor and did not lend itself to being extended. Multics Integrated Data Store (MIDS) Because of the marketing desire to have an IDS on Multics a vernier was implemented over MRDS to make it appear as an IDS type system and was called Multics Integrated Data Store (MIDS). It was incomplete from an IDS point of view and all calls to MIDS were transformed into MRDS callss. At one time the general functionality of MIDS was tested and did work. However, no extensive exposure or use was ever made of MIDS. New MRDS Because of marketing pressure to provide an IDS type interface that was complete and to overcome the inherent inefficiency and deficiency the existing system, a new MRDS design and implementation was conceived. It was to have a data manipulation utility level interface upon which a MRDS and IDS system could be implemented. It was to interface with the transaction processing system in the area of check point and rollback. Other factors contributing to the new implementation were the lack of efficiency, attribute level security, restructuring capability and others. It had already be proven that a relational type data base could be implemented on Multics. However, the full extent and impact of a data base manager on Multics was not appreciated or taken very seriously by those close to the operating system. Page 4 MTB 466 Two MTB's were written describing the proposed changes to the structure of the new architecture and the method of providing access control. Little or no response was elicited by these MTBs and what response there was did not change the new design any. Other than the two MTB's there was no design or implementation documentation generated. The concepts and details were only in the minds of two of the developers and implementation proceeded with no design review. Much of the structure was changed to take advantage of the newly provided features that were then being planned for vfile_ and transaction processing. These included stationary records, select and exclude control orders, records level concurrency control, checkpoint and rollback. The physical structure of the the data base was completely changed as well as the structure of the data model and tuples. The data model was changed from a keyed sequential vfile_ to a series of segments, one for general information and one for each relation in the data base. The data files were moved from their special directory to directly under the data base directory. The directory that in the old system contained the inverted key information was eliminated by encoding the the index keys and placing them in with the data file keys. The tuple structure or the way the tuple information was physically stored was changed drastically and now includes array and linkage information to be stored in the tuples. It also uses a concept of zero length arrays that is not really supported by PL1. The design introduced the notion of files as well as relation being visible to the user. There were to be two kinds of file. One, similar to the old system with one relation per file type, which indeed contain one relation per file, and a new type of file called blocked, that could contain one or more relations per file. The storage space for the blocked files had to be allocated at data base creation time and could not be changed dynamically, ala IDS. With the notion of more then one relation per file the concept of tuple identifiers was introduced. It was a modified vfile_ descriptor for unblocked files and a special 36 bit identifier for blocked files. Because of the limited size of the tuple id, a considerably smaller maximum limit was placed upon the number of relations and tuples a data base could have. With the introduction of files being visible to the user a whole new set of access control definitions were concieved. They were compounded by the newly devised attribute level security design. This required MTB 466 Page 5 considerable modification to the user interface primarily in the areas of data base creation, concurrent access manipulation, all display and administrative control commands. This new user interface was published without the supporting code inplace as the MR7.0 MRDS and LINUS Manuals. Eight months after the new design was conceived by the developers, five people were working on the project. One working on the utility level coding, one converting the modules from the old system to include file information and use the new structures. One working on the security design and implementation. Two others were designing and developing a restructuring mechanism for the system. Executive Office of the President (EOP) About this time the person working on the conversion of old modules to include file information decided to leave the project. He hastily completed the already started conversions and most of the modules involved with the submodels. None of the converted modules were tested and his leaving took some of the undocumented design with him. Preliminary testing showed that new type data bases could not be created and the major effort on the restructuring was diverted to redoing the data base creation implementation. Efforts in the other area continued with the idea that the new vfile_ changes and transaction processing pieces would soon be in place. No design review or MCR had yet been considered. About this time the primary designer and project leader also decided to leave the project. It soon became very apparent that little about the design and implementation concepts of the new system were known by others on the project. With the impending loss of the primary designer and lack of any design documentation a series of sessions were held with the designer to gain as much knowledge of the state and design of the system as possible. Shortly after the departure of the original designer the person working on the security design and coding also left the project. Two new people were added leaving the project with a new project leader, who had been working on restructuring, the other person working on restructuring and data base creation and two new people, one of which had no Multics experience. It was also discovered that some of the changes to vfile_, of which much of the new new design depended would not be done. Page 6 MTB 466 This required a considerable change in the design, of which no one knew much about anyway, and a change in the direction the project was proceeding. Based on the information acquired from the departing project members, and the known state of vfile_ it was decided that the project should continue and try to get all the new and modified code collected into a working system. Functional testing was begun on those parts of the system that did work (creation of data bases). All other resources were applied to debugging and completing those modules of the system not yet integrated. The fixing of many of the modules involved major changes to structures used system wide and many times required complete rewrite of modules (mostly those that were converted from the old system). The functional testing began to expose many implementation oversites. Only simple testing could be done because of the unstable and incomplete state of the system. The fixing of one problem often exposed several new ones including incomplete module implementation. The number of modules had grown to well over two hundred and tracking of problems and module impact was not trivial. About this time the new project leader was injured and had to leave the project temporarily. The project was still attempting to get a limited working system inplace when the marketing requirement for the sale of the Executive Office of the President (EOP) placed a whole new set of requirements and deadlines on the project. Evaluating the current state of the system at that time and considering the unimplemented parts of vfile_ and the deadline imposed by EOP, the whole direction of the project was changed again. It was decided that the system to be delivered to EOP would be achieved by taking the existing incomplete implementation of MRDS and make it appear to the user like the old 6.5 version. This required changing all user interfaces to remove any indication that files exists. The pending EOP requirement to have attribute level security caused the unreviewed approach to attribute level security to come under scrutiny. After consideration by an adhoc committee, the suggested plan was scrapped and an approach using a submodel to provide the attribute level security concocted. To do all this revision and new implementation several new people were added to the project drawing resources from CISL, FSO and Marketing. At one time there were as many as ten people working on the project. With the wide geographical dispersion of the project members and only a general design plan of what was to be done, the communication problem was enormous. MTB 466 Page 7 Near this time the project leader returned, adding some additional manpower to the project. With the help of marketing, the attribute level security requirement was moved from the first EOP release to early 1981. With this removed and with a lot of extra effort the EOP release deadline could be met. The primary functions of the system were well tested and the revised documentation completed. The binding of the new system took considerablely more effort than anticipated. Different from the old system the new one had two bound components. The MRDS modules and the utility level modules. During the binding it was discovered that much of the logical independence of the utility level was violated early in the implementation. And that no consideration had been given to how the new MRDS would work with the old software and old data bases. These obstacles were overcome with dilegent effort and ingenuity and somehow the new system did work for most cases. The new system works after a fashion but only for unblocked files. It carries along considerable chunks of untested or exercised code that is associated with blocked files. Since the design has had two major changes in direction since the first part was implemented and since the original design was not documented much of the original intent, goals and continuity have been lost. The organization of the implementation has made it difficult to add performance enhancements and bug fixes. Amazingly the performance is better in some areas but worse in others.