Multics Technical Bulletin MTB-688 Multics C Impl. Spec. To: Distribution From: Douglas Howe Date: 26 April 1985 Subject: Multics C Implementation Specification 1. Abstract This document contains the specifications required to bring up a System V Release 2.0 compatible C on Multics. The C compiler on Multics will be as Multics compatible as possible without becoming incompatible with System V Release 2.0 C.(1) Changes will be marked with change bars. | Comments should be sent to the authors: via Multics mail to: DGHowe.Multics via posted mail to: Douglas G. Howe Advanced Computing Technology Centre Foothills Professional Building 1620 29th St., N.W. Calgary Alberta Canada T2N-4L7 via telephone to: (403)-270-5400 (403)-270-5437 (Howe) via forum on System-M to: >udd>m>DGHowe>mtgs_dir>c>c_imp (c) _________________________________________________________________ Multics project internal documentation; not to be reproduced or distributed outside the Multics project. (1) Unix and System V Release 2.0 are registered trademarks of AT & T MTB-688 Multics Technical Bulletin Multics C Impl. Spec. TABLE OF CONTENTS Section Page Subject ======= ==== ======= 1 i Abstract 2 1 Preface 3 2 Introduction 3.1 2 . . Goal 3.2 3 . . References For This Document 4 4 Execution Environment 4.1 4 . . Stack Disciplines 4.2 4 . . Argument List Creation 5 4 Object Segment Format 5.1 5 . . Symbol Section 5.2 5 . . Statement Map 6 6 Entrypoints 6.1 6 . . Main Entrypoints 7 7 Calling Conventions 7.1 7 . . Calling C to C 7.2 7 . . Calling a Main Program 7.3 7 . . Calls from C to Non-C Procedures 7.4 8 . . Calls from Non-C Procedures to C Functions 8 8 Storage Allocation 9 9 Code Conventions 9.1 9 . . Forbidden Instructions 9.2 9 . . Use of Pointer Registers 9.3 10 . . Identifiers 10 11 General Information 10.1 11 . . Data Type Sizes 10.1.1 12 . . . . Conversion of Data Types Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 2. Preface This document defines the format of a C object segment on Multics and describes how C programs should use pl1_operators_. This MTB, MTB-647 and the other related MTBs, are intended to supply most of the documentation needed to implement the C compiler for Multics. We wish to thank those people who have made this possible either by creating tools for analysis or through input of subject matter. These people are Ron Barstad, Greg Baryza, Rick Gray, Steve Herbst, Dave Mason, Audrey Neal, Tom Oke, Doug Robinson, Melanie Weaver and Brian Westcott. MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 3. Introduction 3.1. Goal The goal of this project is to create a Multics Native C compiler. This compiler will allow the porting of existing software and the use of basic Multics tools. The compiler is to be compatible with System V Release 2.0 while losing as little as possible of Multics. It will be accompanied by the C runtime library with some routines redesigned to understand the Multics environment. To accomplish this goal the compiler will be divided into a two versions. These versions can be defined as follows: I. Demo Compiler This version of the compiler will be used to bring up C. This will be done using an alm(1) intermediate; C programs will be translated to alm source code, and then compiled on Multics. In the initial transfer of the compiler a Unix system will be used to generate the alm source. It is intended that this version will be usable in some form to allow third party software to be brought to Multics. II. Production Compiler This will be the first general release of the compiler and will be an extension of the demo compiler. It is not decided at this point if the second version will still generate alm source or if it will do object generation directly. The second version should include some improvements in efficiency and will be able to use probe to some extent. The full definition of this version will be given at a later date. _________________________________________________________________ (1) Assembler Language Multics Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 3.2. References For This Document 1) MTB-647 created by Greg Baryza. 2) The C Programming Language Kernighan, Brian W. & Ritchie, Dennis M. Prentice-Hall (1978) Englewood Cliffs, New Jersey 4) Multics Programmers Reference Manual (10.2 AG91-03A) (hereafter referred to as MPRM) 5) MTB 689 titled The C Runtime System on Multics by Doug Howe. 6) MTB 691 titled The C External Execution Environment by Doug Howe. 8) MTB entitled the Multics Link Editor by Dean Elhard and Doug Howe. 9) MTB 707 entitled C Required Changes To ALM Specification. MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 4. Execution Environment The execution environment to be used by the production compiler on Multics will allow the use of Multics user tools such as probe, trace and profile. It will be compatible with the current PL1 environment. The Multics standard execution environment is documented in the MPRM Appendix H. 4.1. Stack Disciplines Like most other languages, C will use the same stack as the Multics command environment for its local storage. All activities that affect the size of the stack, such as pushing, popping and extending stack frames, will be done via `pl1_operators_'. 4.2. Argument List Creation In Multics, all calls that pass arguments should create a structure defining where the arguments can be found and where a set of descriptors defining their data type can be located. A complete description can be found in the MPRM H-20. C has no runtime requirement for argument descriptors. Therefore argument descriptors will not be included in the demo version of the compiler. These argument descriptors will be added to the production version of the compiler as required for the support of various Multics tools. If required a new descriptor structure and a new method for the | calling sequence will be designed. | 5. Object Segment Format The object segments generated by the C compiler will be in Multics standard format by default due to the use of alm as an intermediate language. This format is defined in the MPRM Appendix G. Declarations for all structured items are included. There are two exceptions to the format: the Symbol Section and the Statement Map. Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 5.1. Symbol Section Due to the use of alm as the intermediate language of the compiler, C will be lacking complete Symbol Section information in the demo version. Complete Symbol Section information will be added as a function of the C compiler or as a series of pseudo | ops added to ALM (see MTB 707). | 5.2. Statement Map Pseudo ops in alm or direct object creation will achieve a statement map in the production version of the compiler. The Statement Map will refer to the original source segment. Macros will be seen in their non-expanded form. MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 6. Entrypoints All entrypoints (except for static and main_ -- see below), will be defined as external entrypoints refering to the pl1_ops entry ext_entry to perform the stack set up. All entries of functions that push their own stack frames must be preceded by the structured information described for the entry sequence on page G-3 of the MPRM. This will be generated by an | ALM pesudo op as defined in MTB 707. | 6.1. Main Entrypoints Due to Multics standard entry procedure the C `main' program would not be found by the standard searching method. For this reason, as well as allowing a place for initial set up to take place, C programs containing a `main' program will have an added entrypoint called `main_' as is currently done with Fortran. The definition of the entrypoint `main' will be that of an external entrypoint. The entrypoint main_ will have to perform a series of precise functions. These functions will be fully defined in another MTB entitled The C External Execution Environment (MTB 691). Initially `main_' will be a separate program generated and link edited with the main program. Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 7. Calling Conventions Copying of arguments to be passed by value will be done by the caller. As usual in Multics, if the name of the routine to be called contains a "$" it will be assumed to be of the form segment_name$entry_name. There are four different situations that involve calls. These are: calls from one C function to another, calls to main programs, calls from C to non-C procedures and calls from non-C procedures to C functions. 7.1. Calling C to C A call from C to C will be done directly with the use of `pl1_operators_'. The types of the arguments will be as described in MTB 647. 7.2. Calling a Main Program C progams will be callable in two ways: one through `main_' expecting it's arguments in the standard Multics command processor format; and through `main' which will expect it's arguments in the standard C Argc, Argv format. The normal entry sequence for a C program will be via the command processor linking to `main_'. Within an execution unit calls to main will be resolved to the standard C entry `main'. Although both entrypoints are accessable to the user, it will remain the users responsibility to ensure that the correct values are passed as parameters. 7.3. Calls from C to Non-C Procedures C will be able to call non-C functions if the non-C function being called understands the data types being passed to it. For this reason only pointers and some basic arithmetic data types will be compatible with non-C languages. MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 7.4. Calls from Non-C Procedures to C Functions Non-C functions will be able to call C if the C functions understand the data types being passed to them. For this reason only pointers and some basic arithmetic data types will be compatible with C. 8. Storage Allocation C will follow the Multics standard for the allocation of it's variables. The only exception to this standard is due to the definition of C external variables. C external variables are defined by the normal C environment to be on a per-execution-process basis, while Multics external variables are on a per-login-process basis. For this reason C external and static variables will be allocated as a normal external variable but the execution unit will be expected to be linked as is defined in MTB 691. Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 9. Code Conventions Multics has a few conventions that must be followed. The major conventions are listed in the following paragraphs. 9.1. Forbidden Instructions C will not use any of the alm instruction set which may become obsolete in future releases of Multics. 9.2. Use of Pointer Registers Pointer registers are widely used in Multics because of the segmented address space. Everything outside of the current segment must be addressed via a segment number. Some pointer registers have defined uses: - PR6 should always point to the current stack frame. - PR0 is set by the operators for programs using `pl1_operators_'. It points to the `pl1_operators_' transfer vector except during a call, when it points to the argument list. The following instruction can be used to reset PR0 to the transfer vector: epp0 pr7|28,* where PR7 points to the base of the stack. - PR4 is usually used when a pointer to the linkage/static section is needed. The entry operators store it in the stack frame, so it can be reloaded by the following instruction: epp4 pr6|36,* - PR7 points to the base of the stack segment when a program is entered. It may be reused by the program and reloaded by the instruction epbp7 pr6|0,* `pl1_operators_' does not save the values of other pointer registers across calls. MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 9.3. Identifiers In the demo version of the compiler variable names will likely have maximum length of 32 characters. The name must be made up of at least one character followed by a series of characters, numbers, a $ or the underscore character. If the identifier name contains a single $ it will be taken to represent a Multics external identifier. The following bnf style grammar will explain the variable name. <character> ::= a|b|c|d|.....|y|z|A|B|C|D|......|Y|Z|_ <character str> ::= <character> | <charcter> <character str> <number> ::= 0|1|2|3|.....|8|9 | <number> <number> <identifier> ::= <character>[ <character str>| <number>| "$"]* External Identifiers on Multics have the form "segname$entry_name" where: <segname> ::= <character>[<character str>| <number>]* <entry_name> ::= <identifier> where []* means zero or more. Multics Technical Bulletin MTB-688 Multics C Impl. Spec. 10. General Information 10.1. Data Type Sizes At the time of this writing, the following sizes are proposed for the basic data types: short int (36/18) bits (half/word) aligned int 36 bits word aligned long int 72 bits double word aligned unsigned int 36 bits word aligned(1) unsigned long 72 bits double word aligned(2) char 9 bits / char word aligned float 36 bits (8 bit exponent 28 bit mantissa) word aligned double 72 bits (8 bit exponent 64 bit mantissa) double word aligned pointer ITS 72 bits double word aligned pointer packed 36 bits word aligned In the demo version of the compiler short int types will be 36 bits long and will be word aligned. Hopefully, short ints will be 18 bits long and half word aligned in the production version of the compiler. _________________________________________________________________ (1) This is a change from MTB 647 (2) This is a change from MTB 647 MTB-688 Multics Technical Bulletin Multics C Impl. Spec. 10.1.1. Conversion of Data Types Conversion of C Pointers will be handled as follows. 1. The size of a pointer in C will be 72 bits. 2. Conversion of a value of zero in an int will lead to a null pointer or to a pointer value of -1|1. 3. Conversion of int to pointer or pointer to int will be done via the pack and unpack pointer instructions. 4. No conversion will take place on the passing or receiving of pointers as parameters. 5. Conversion of pointers to long ints or long ints to pointers will be done directly on a bit to bit relationship. 6. Conversion of a null pointer will lead to an integer value of | zero. |