Multics Technical Bulletin MTB-647 C Compiler Spec. To: Distribution From: Gregory A. Baryza Date: 23 January 1984 Subject: Multics C Compiler Specification 1. Abstract This MTB discusses the implementation issues surrounding the installation of an externally-developed compiler for the "C" programming language on Multics. The intent is to have a compiler which accepts a version of the language identical to that already present on GCOS-III and the DPS6. The compiler will run native in the Multics environment and produce standard Multics object segments. Comments on the nature and content of the supporting run-time library for C are also included. Comments on this MTB should be sent to the author - via Multics mail to: Baryza.Multics via posted mail to: Gregory A. Baryza Honeywell Information Systems, Inc. Four Cambridge Center Cambridge, Massachusetts, U.S.A. 02142 via telephone to: (HVN)-261-9315, (617)-492-9315 via forum on System-M to: >user_dir_dir>Multics>Baryza>mtgs>C_Compiler_Spec (cc_spec) ________________________________________ Multics project internal documentation; not to be reproduced or distributed outside the Multics project. MTB-647 Multics Technical Bulletin C Compiler Spec. TABLE OF CONTENTS Section Page Subject ======= ==== ======= 1 i Abstract 2 1 Preface 3 2 Introduction 3.1 2 . . Overall Goal 3.2 2 . . Motivation 3.3 3 . . Division of Labor 3.4 3 . . Reference Document for C 4 4 Identifiers 4.1 4 . . Characters Allowed in Identifiers 4.2 4 . . Length of Identifiers 4.3 4 . . Reserved Identifiers 5 5 Data Types 5.1 5 . . Basic Types 5.2 6 . . Derived Types 5.2.1 6 . . . . Pointers 5.2.1.1 6 . . . . . . Pointers to Functions 5.2.2 6 . . . . Aggregates 5.2.2.1 6 . . . . . . Arrays 5.2.2.1.1 7 . . . . . . . . Strings 5.2.2.2 7 . . . . . . Structures 5.2.2.2.1 7 . . . . . . . . Fields 5.2.2.3 8 . . . . . . Unions 5.2.3 8 . . . . Enumerations 5.3 10 . . Type Definitions 5.4 10 . . Storage Classes 6 11 Constants 6.1 11 . . Integers 6.1.1 11 . . . . Decimal 6.1.2 11 . . . . Octal 6.1.3 11 . . . . Hexadecimal 6.1.4 11 . . . . Representation of LONG Values 6.2 12 . . Floating Point 6.3 12 . . ASCII Characters 6.4 13 . . Strings 7 14 Expressions 8 15 Keywords 9 16 Data Type Conversion 9.1 16 . . Character to Integer 9.2 16 . . Integer to Character 9.3 16 . . Floating Point to Double 9.4 16 . . Double to Floating Point 9.5 16 . . Floating Point to Integer 9.6 17 . . Integer to Floating Point 9.7 17 . . Integer to Unsigned 9.8 17 . . Pointer to Integer Multics Technical Bulletin MTB-647 C Compiler Spec. 9.9 17 . . Integer to Pointer 9.10 17 . . The Standard Conversion Rules 10 19 Statements 11 20 Compiler Directives 11.1 20 . . #define 11.2 20 . . #undef 11.3 20 . . #if 11.4 20 . . #ifdef 11.5 21 . . #ifndef 11.6 21 . . #else 11.7 21 . . #elseif 11.8 21 . . #endif 11.9 21 . . #line 11.10 21 . . #include 11.11 22 . . #equate 12 23 C Programs on Multics 12.1 23 . . The C Program Model 12.2 24 . . Symbol Table Requirements 12.2.1 24 . . . . Descriptor Types 12.2.2 25 . . . . Other Symbol Table Issues 12.3 26 . . Probe Changes 12.4 26 . . Memory Allocation 12.5 27 . . Use of an Operators Segment 12.6 27 . . Argument Lists 12.7 31 . . References to Library Routines 12.8 32 . . Function Name Resolution 13 33 Run-Time Library Definition 13.1 33 . . Input & Output 13.1.1 34 . . . . fopen 13.1.2 35 . . . . fclose 13.1.3 35 . . . . getc 13.1.4 36 . . . . putc 13.1.5 36 . . . . fgets 13.1.6 36 . . . . fputs 13.1.7 37 . . . . printf 13.1.8 37 . . . . fprintf 13.1.9 37 . . . . sprintf 13.1.10 38 . . . . scanf 13.1.11 38 . . . . fscanf 13.1.12 39 . . . . sscanf 13.1.13 39 . . . . rewind 13.1.14 39 . . . . open_file 13.1.15 40 . . . . open_switch 13.1.16 40 . . . . attach_switch 13.1.17 40 . . . . detach_switch 13.1.18 41 . . . . fflush 13.2 41 . . String Manipulation 13.2.1 41 . . . . strcat 13.2.2 42 . . . . strncat 13.2.3 42 . . . . strcmp MTB-647 Multics Technical Bulletin C Compiler Spec. 13.2.4 42 . . . . strncmp 13.2.5 43 . . . . strcpy 13.2.6 43 . . . . strncpy 13.2.7 43 . . . . strlen 13.2.8 44 . . . . strchr 13.2.9 44 . . . . strrchr 13.3 44 . . Memory Allocation 13.3.1 44 . . . . malloc 13.3.2 45 . . . . free 13.3.3 45 . . . . calloc 13.3.4 46 . . . . realloc 13.4 46 . . Mathematical Functions 13.4 46 . . . . abs 13.4 46 . . . . acos 13.4 46 . . . . asin 13.4 46 . . . . atan 13.4 46 . . . . ceil 13.4 46 . . . . cos 13.4 46 . . . . cosd 13.4 46 . . . . cosh 13.4 46 . . . . exp 13.4 47 . . . . floor 13.4 47 . . . . log 13.4 47 . . . . log10 13.4 47 . . . . log2 13.4 47 . . . . sin 13.4 47 . . . . sind 13.4 47 . . . . sinh 13.4 47 . . . . srqt 13.4 47 . . . . tan 13.4 47 . . . . tand 13.4 47 . . . . tanh 13.5 47 . . Miscellaneous 13.5.1 48 . . . . clock 13.5.2 48 . . . . vclock 13.5.3 48 . . . . date 13.5.4 48 . . . . time 13.5.5 49 . . . . exit 14 50 Open Issues 14.1 50 . . Use of Standard Operators 14.2 50 . . Mismatch in System Calling Conventions 14.3 51 . . Unbound Programs and Name Resolution 14.4 51 . . Support for the Entry Keyword 14.5 51 . . Linker Support for the MAIN Entrypoint 14.6 52 . . Content of the Library 14.7 52 . . UNIX Environment Features 14.7.1 52 . . . . Enclosing the Main Routine 14.7.2 53 . . . . Device Nomenclature 14.7.3 53 . . . . Support for ARGC & ARGV Multics Technical Bulletin MTB-647 C Compiler Spec. 2. Preface Developing the specification for anything is a difficult task. The trade-offs are not always easy and seldom do all go away feeling satisfied. The writing of this specification for C has run true to form. The C language was invented in 1972 by Dennis Ritchie of Bell Laboratories. Since then, it has become widely accepted as a major programming language. And, like all major languages, it exists in a number of dialects (sometimes several, subtly incompatible versions for a given machine). However, the evolution of C has been strongly affected by the features provided by its UNIX(1) host. For better or worse, these are also features which are found in a number of other commercially available operating systems. A large body of code, from the UNIX "shell" to some sophisticated applications programs, has come to expect their presence. Many of these programs and systems are viewed as useful adjuncts to the facilities Multics already provides. The crux of the matter is that some of the expected "features" are missing on Multics and the ways of doing things are different. Providing the features and paths is often difficult or undesirable. This specification attempts balance the expectations of programs written elsewhere and of those Multics programmers developing codes for use only on Multics. Thus, there are some "un-Multicious" thoughts herein; but, I hope they add to the overall environment rather than subtracting from it. I want to thank those people who have contributed to this document either in conversation or as reviewers of early drafts: Peter Fraser, Steve Herbst, Barry Margolin, Kevin Martin, Dave Mason, Tom Oke, Ed Ranzenbach, Olin Sibert, Melanie Weaver, and Brian Westcott. Gregory A. Baryza ________________________________________ (1) UNIX is a registered trademark of Bell Laboratories. It is commercially available under license from Western Electric. MTB-647 Multics Technical Bulletin C Compiler Spec. 3. Introduction 3.1. Overall Goal The intent of this project is to provide a compiler and run-time library for the C language on Multics. The compiler is to run as a standard Multics command and produce Multics object segments, listings, error messages, and so on, in a style consistent with other Multics compilers. The run-time library will provide functions common to most C implementations. It will interface between C programs and the services provided by Multics. It should be noted that those library routines which provide agency services to operating system functions for specific systems (e.g. tasking on UNIX) will not necessarily be provided for Multics. The mechanism by which this will be achieved is the installation of an existing compiler, written in C and developed at the University of Waterloo, on Multics. No changes in the syntax or semantics of the language definition for Multics are implied, nor should any be assumed. 3.2. Motivation The increasing popularity of the C programming language is impossible to deny. Many mainframe manufacturers and most mini- and microcomputers now sport C compilers (and run-time libraries). As a consequence much systems and applications software is now being written in C and, ipso facto, support for the language is a "requirement" for commercial, general-purpose systems. Within the present Multics community, Standard Telephone and Cable (U.K.) has been the strongest proponent of C on the system. To satisfy this need, Honeywell has contracted with the University of Waterloo to produce compilers for GCOS-III and GCOS-8, and for the DPS-6 product line. In order to take advantage of this opportunity, MDC would like to utilize the work already in progress toward providing a Multics C compiler. The compiler is expected to be source-language compatible with the other Honeywell offerings. Multics Technical Bulletin MTB-647 C Compiler Spec. 3.3. Division of Labor Three parties will be involved in the development of the product for Multics: the Universities of Waterloo and Calgary, and the Multics Development Center. The University of Waterloo will be responsible for the pre-processor, parser, and code generator for the C language. Mainly, this effort involves changes to their present compiler necessary to support the interpretation of C on Multics. The University of Calgary will be responsible for packaging the output of the code generator into Multics standard object segments, complete with symbol table and debug information. They will also provide the run-time library assumed by C programs and its connection to Multics facilities. The Multics Development Center will do the overall project coordination. It will also make the changes in the system software (probe, symbol table utilities, binder, etc.) necessary to support the product. 3.4. Reference Document for C Unless otherwise noted, references to features of the language, section and page numbers, and examples will be presumed to come from The C Programming Language(1) Kernighan, Brian W. & Ritchie, Dennis M. Prentice-Hall (1978) Englewood Cliffs, New Jersey In particular, Appendix A of this document claims to be a reference manual for the language. Unfortunately, it suffers from some ambiguities and omissions. These will be cited when they are discussed. One further point deserves mention. Appendix A frequently makes reference to the H6000 version of the compiler. This is not the Waterloo product. It is an internally developed compiler for C which runs (primarily) on the Bell Laboratories' GCOS systems. ________________________________________ (1) This document is also commonly referred to to as K&R, or the "White Book". MTB-647 Multics Technical Bulletin C Compiler Spec. 4. Identifiers Identifiers in C are used to name variables and symbolic constants, functions, structures, type definitions, etc. 4.1. Characters Allowed in Identifiers Identifiers are sequences of characters constructed from the following sets of items: - upper- and lower-case letters - digits - the underscore character, "_" The Multics implementation also adheres to the conventions(1) in distinguishing upper and lower case letters as different, and requiring that the initial character of an identifier be either a letter or an underscore character. 4.2. Length of Identifiers Identifiers may be as long as storage requirements in the compiler permit. However, the compiler will use only the first 256 characters given to distinguish identifiers from one another. NOTE: This is a deviation from the K&R practice of using only the first 8 characters to distinguish between identifiers. 4.3. Reserved Identifiers Certain identifiers in C are reserved as having special meaning. Most of them are keywords and are listed in a section by that title. In addition, the function named "main" is designated as the entrypoint at which the system is to begin execution of the C program. ________________________________________ (1) K&R, Chapter 2, Types, Operators and Expressions; Section 2.1, Variable Names; pg. 33 Multics Technical Bulletin MTB-647 C Compiler Spec. 5. Data Types The C language has a small number of fundamental data types: integers, floating point numbers, and characters. In addition, the declaration rules allow for the construction of a potentially infinite set of derived types. We will discuss the representation of each of these classes separately. 5.1. Basic Types C allows the three fundamental types described above. In addition, declarations of each of these types also allows "adjectives" which modify the size of the basic type or its arithmetic performance. These adjectives are: "short", "long", and "unsigned". The following table gives the various base types and their equivalent representation in machine terms. Equivalent forms are listed together. Width Sign Bit Boundary C Declaration In Bits Present Alignment int 36 Yes Word short int long int 72 Yes Double Word unsigned int 36 No Word unsigned short int short unsigned int unsigned short short unsigned unsigned long int 72 No Double Word long unsigned int unsigned long long unsigned float 8 & 28 Yes Word double 8 & 64 Yes Double Word long float char 9 No Character unsigned char The data types "float" and "double" also include a signed, 8-bit, power-of-two exponent. MTB-647 Multics Technical Bulletin C Compiler Spec. NOTE: While K&R(1) permit only one adjective to precede the base type declaration, the compiler for Multics will allow both a size and an "unsigned" specifier. They may appear in any order (as shown above). 5.2. Derived Types The derived types of C are: pointers to typed-objects (ala Pascal and ALGOL-68), and various kinds of aggregates. Each of these will be discussed separately. 5.2.1. Pointers Pointers in C are always pointers to objects of a specific type. A C pointer is represented as a Multics ITS pointer. Thus, a C pointer is equivalent to a PL/I "pointer aligned". 5.2.1.1. Pointers to Functions Since the C language does not allow(2) the definition of functions within other functions, pointers to functions do not need an environment pointer as part of their reference information. Hence, they may also be represented as ITS pointers. This makes them equivalent to the result of the Multics PL/I builtin function, codeptr. 5.2.2. Aggregates The Multics C compiler supports the construction of data aggregates. Several types are possible: arrays, structures, unions. Of course, each of these aggregates may be formed from data elements which are themselves either basic types or aggregates. ________________________________________ (1) K&R, Appendix A, Section 8.2, pg. 193 (2) K&R, Chapter 4, Functions and Program Structure; Section 4.8, Block Structure; pg. 81 Multics Technical Bulletin MTB-647 C Compiler Spec. 5.2.2.1. Arrays C arrays correspond directly to PL/I arrays. However, the initial subscript for C arrays is zero so that the C declaration: int A[8]; is equivalent to the PL/I: dcl A real fixed binary (35, 0) aligned dimension (0 : 7); 5.2.2.1.1. Strings Like Pascal and Ada, strings in C are derived types. They are represented as arrays of characters. The last element in the string is the ASCII "NUL" character (location 0/0) and forms its delimiter. Since the address of a C string is the address of its first element, they can be overlayed with PL/I based character strings provided the length has been found first by scanning for the NUL character at the end. 5.2.2.2. Structures Structures in C are also directly analogous to those in PL/I. In particular, the same alignment rules apply and a C structure will always be aligned to the most strict boundary required of any of its components. 5.2.2.2.1. Fields The Multics C compiler also implements the definition of fields(1) within a machine word. Fields on Multics are assigned left-to-right. For example, a C "float" variable is described as: struct float_bin_real { unsigned exponent : 8; unsigned mantissa : 28; }; For purposes of computation, fields in Multics are treated as unsigned integers. ________________________________________ (1) K&R, Appendix A, Section 8.5, pgs. 196-197 MTB-647 Multics Technical Bulletin C Compiler Spec. NOTE: Multics deviates from the K&R specifications in allowing a pointer to point to a bit field. In this case, it points to the leftmost bit of the field. It is the programmers responsibility to insure that later use of this pointer as a locator does not violate language boundary assumptions (i.e. if you later use it to point to a character, it better be aligned to a character boundary). 5.2.2.3. Unions Unions have no single counterpart in PL/I. An object which is a union may be thought of as a named piece of storage which is large enough to contain any of the objects defined to be part of the union. The syntax for defining unions is almost identical to that of structures, and component access is accomplished in the same way. For example, the C declaration: union tag_name { int a; float b; char *c; } var_name; would have to be represented by a series of PL/I declarations involving aliasing a properly aligned piece of storage large enough to contain the largest value. In this example, we are talking about 72 bits aligned on an even word boundary because "var_name.c" is a pointer. Note that it is not possible in PL/I to create a sequence of declarations which preserves both the storage overlaying and the naming structure of C unions. 5.2.3. Enumerations NOTE: This data type is an extension to the language defined by K&R. Multics Technical Bulletin MTB-647 C Compiler Spec. Enumerated data types are user defined data types which are represented as though they were of type "int". The declaration consists of the keyword "enum" followed by the tag-name of the type, a list of values, and an optional list of variable identifiers. For example, the declaration: enum day {sun, mon, tue, wed, thu, fri, sat} start; defines the variable "start" to be of type "day". The values which can be assigned to "start" are "sun", "mon", "tue", etc. However, unlike some languages(1) which do not specify the mapping between enumerated values and their representations, this version of C does. The first value in the list is assigned the value 0, the second 1, and so on. This version of C also allows arithmetic on the enumerations. This means that: start = sun + 2; and start = tue; are equivalent. In fact, they are also equivalent to the sequence: #define sun 0 #define mon 1 . . . int start; start = sun + 2; In addition, the values specified in the enumeration list may also be optionally assigned values. This allows an enumeration of the form: enum day {sun = -1, mon = 1, tue, wed, thu, fri, sat = -1} start; where "sun" and "sat" have the assigned value -1, "mon" is assigned the value 1, and the values of "tue" to "fri" are assigned successive integers starting with the value of "mon + 1". As shown overlaps in enumerated values are allowed. ________________________________________ (1) Ada, for instance MTB-647 Multics Technical Bulletin C Compiler Spec. 5.3. Type Definitions Type definitions, which allow the user to define synonyms for existing types to increase portability and readability, are allowed as in K&R. NOTE: The tag-name of an enumerated definition may also be used to define additional data elements of that "type" in the same manner as with structures and unions. Thus, using the enumerated type "day", we may define a finish date as another instance of that type, and even initialize it, by: enum day finish = fri; 5.4. Storage Classes K&R defines four storage classes: auto, static, extern, and register. The correspondence with PL/I storage classes is as follows: C Storage Class Equivalent PL/I auto internal automatic static internal static extern external static register internal automatic The Multics C compiler does not allow variables to be assigned to registers. For this implementation, "register" and "auto" are taken as equivalent. This does not mean, however, that the "&" operator can be used to take the address of a register variable. Such use is non-portable and prohibited by the compiler for the sake of consistency with other implementations. In addition, future versions of the compiler may make use of the register declaration, not as a way of dedicating a "fast" machine register, but as a way of denoting that no pointer de-reference can legitimately change the value of the variable. Multics Technical Bulletin MTB-647 C Compiler Spec. 6. Constants The C language allows several different types of constants for numeric values and provides representations for ASCII characters and strings. 6.1. Integers With the exception of those explicitly designated as long, all integer constants are represented in storage as "int". 6.1.1. Decimal Decimal integers are written as whole numbers (no decimal point) and have no leading zeroes; examples are: 7 34359738637 100 13 6.1.2. Octal Octal numbers are also written as whole numbers. They are formed from the digits zero through seven and are always preceded by at least one leading zero as in 06 0377777777777 0100 013 6.1.3. Hexadecimal Numbers in hexadecimal are formed from the set of hexadecimal. In determining the value of the constant, case is ignored. However, to distinguish the use of hexadecimal, the first two characters of all such constant are required to be either "0x" or "0X". The following are examples of hexadecimal constants in C: 0x6 0X7FfF 0xDead 0x0FF 6.1.4. Representation of LONG Values A "long" integer constant may be represented in one of two ways. Either the number may be obviously too large to fit in an "int", in which case the compiler will automatically type it as "long". Or, the constant may be suffixed with an upper- or lower-case "L". In the latter instance, the compiler will convert the value to a "long" representation (including double word alignment, for example) regardless of the number of digits. MTB-647 Multics Technical Bulletin C Compiler Spec. 6.2. Floating Point Number written with a decimal point in them are considered to be floating point values. They may also (optionally) have leading zeroes, a decimal fraction, and an exponent part in the usual form. Floating point constants are always represented as being of type "double". 6.3. ASCII Characters An ASCII character constant is a sequence of from one to four ASCII characters enclosed in single quotes. Each character occupies one byte (9 bits) of storage. Character constants which contain less than four characters are stored right-justified in a word and the high-order bit of the first character is propagated to the left end of the word. For the Multics representation of ASCII, this amounts to zero-fill. A number of escape sequences are permitted in character constants to ease representation of certain characters. The backslash () character introduces the sequence. Allowed sequences are: Escape Sequence Interpretation t Horizontal tab \ Backslash ' Single quote " Double quote n Newline c Carriage return f Form feed b Backspace d{d{d}} A one- to three-character octal constant whose value is the value of the character NOTE: ASCII character constants accepted by this compiler are a superset of those described by K&R. Multics Technical Bulletin MTB-647 C Compiler Spec. 6.4. Strings A string constant is a sequence of zero or more characters enclosed by double quote marks. Escape sequences are permitted in character strings also. As noted elsewhere, a string constant is treated as an array of single characters terminated by an ASCII NUL byte (0). It is worth mentioning at this point that long strings can be continued over several lines of source code. The sequence "<nl>", where <nl> is a real newline character, will do this. When the "<nl>" is encountered while scanning a string during compilation, the lexical analyzer discards these two characters and all unescaped leading whitespace from the succeeding line. This allows long strings to be squeezed into the available space and also permit indentation at the proper place. Escaped whitespace characters are included "as is". The following examples illustrate this. Input Sequence Interpretation "This is "This is continued" continued" "This string "This string has four blanks" has four blanks" MTB-647 Multics Technical Bulletin C Compiler Spec. 7. Expressions This section deals with expressions in C. The various operations in an expression are evaluated according to a defined precedence. Many operations share the same precedence. In that case, each precedence class will group either left-to-right or right-to-left depending on the class. The standard(1) rules for operator precedence and associativity do not completely determine the order of evaluation of expressions, however. In the case of "A+B", C says nothing about whether A or B should be evaluated first. In such situations, the Multics compiler will evaluate each of the sub-expressions in an undetermined order, even if there are side-effects to a certain order. Parentheses, such as "(A)+B", cannot be used to force a certain order in such cases. If a particular order is necessary, the expression will have to be broken down into two statements with the result of the first stored into a temporary variable and used in the second. Since functions in Multics C may be declared to return no result of consequence ("void"), such functions may only be used in restricted cases. In general, if the function result would have had to be used in further evaluating an expression, the "void" function invocation is prohibited. With the preceding caveats, the expressions of the Multics C compiler are those of K&R. ________________________________________ (1) K&R, Appendix A, pgs. 185-192 Multics Technical Bulletin MTB-647 C Compiler Spec. 8. Keywords There are a number of keywords reserved by the C compiler. They cannot be used anywhere as identifiers. In keeping with standard Multics practice, these keywords will only be recognized in lower-case. Use of identifiers which are identical to keywords except for case should be avoided as a portability issue. The table of such keywords is auto else int switch break entry long typedef case <> enum register union char extern return unsigned continue float short <> void default for sizeof while do goto static double if struc NOTE: Entries marked with "<>" are additions to the list of keywords defined in K&R. A synopsis of their usage is Keyword Explanation enum This keyword is a declarator which introduces the definition of a user-defined data item. The values which may "properly" be assigned to this data type appear in the list which follows the tag-name associated with this type. void The keyword, "void", is used in place of a data-type specifier in functions. It indicates that the value returned by the function is not used and is therefore of no importance. Hence, functions declared "void" may not have their return values assigned to anything, nor may they appear in expressions involving additional computation. A "void" function is effectively a subroutine. MTB-647 Multics Technical Bulletin C Compiler Spec. 9. Data Type Conversion 9.1. Character to Integer The character object will be converted to an integer value which represents the character objects value in memory. If the character object is not declared "unsigned", sign extension of the left-most character in the character object will take place. An unsigned character object will be copied "as is". For example, int x; x = '777a'; will result in x having the value, -415. If x were additionally declared as "unsigned", then the assignment will result in x being set to 261729. 9.2. Integer to Character When an integer is converted to a character, the result is the low-order nine bits of the integer value. All other bits are ignored. 9.3. Floating Point to Double The mantissa of the "float" value is extended on the right to carry out all floating point arithmetic. 9.4. Double to Floating Point The "double" value is rounded when the target precision is "float". 9.5. Floating Point to Integer Floating point numbers have their decimal fraction parts truncated toward zero. If the truncated value is too large to fit in the target integer, continued execution will yield undefined results. Multics Technical Bulletin MTB-647 C Compiler Spec. 9.6. Integer to Floating Point The conversion occurs in the expected way. However, some loss of accuracy may result if the floating point target cannot hold all the significant digits of the source exactly. 9.7. Integer to Unsigned The result of this conversion is the smallest unsigned integer congruent to the "int" source mod 2**N (where N is the number of bits in the unsigned number). The effect of this is that the actual bit pattern for the number remains unchanged (since the Multics processor is a binary machine). 9.8. Pointer to Integer Pointers will be stored in "int" data items as Multics packed pointers. They will be stored in "long" items as Multics ITS pairs. However, to conform to standard C usage, Multics null pointers will be stored as the value zero. 9.9. Integer to Pointer An "int" being converted to a pointer will be assumed to be in packed pointer format; a "long", in ITS format. An "int" or "long" whose value is zero will be converted into a Multics null pointer (in the ring of execution).(1) 9.10. The Standard Conversion Rules Many binary operators cause conversion of their operands to other types by default. The conversions follow the rules given below. NOTE: Because "unsigned long int" is allowed in this implementation, these rules differ slightly from those given in the reference(2) document. ________________________________________ (1) Thus, a Multics null pointer (segno = -1, wordno = 1, bitno = 0) and an integer value of zero are considered equal. One will be converted into the other for purposes of assignment or during "casts". They will be converted into a common form for comparison. (2) K&R, Appendix A, Section 6.6, pg. 184 MTB-647 Multics Technical Bulletin C Compiler Spec. 1. Any operands of type "char" are converted to "int". 2. Any operands of type "float" are converted to "double". 3. If either operand is of type "double" the other is converted to "double", and the result of the operation will be "double". 4. If either operand has an attribute of "long", the other will be converted to "long"; and the result will have the attribute, "long". 5. If either operand has an attribute of "unsigned", the other will be converted to "unsigned"; and the result will have the attribute, "unsigned". 6. Otherwise, the two operands are converted to "int", and the result is "int". Multics Technical Bulletin MTB-647 C Compiler Spec. 10. Statements The statements accepted by the Multics C compiler are those defined by K&R. In addition, the following general remarks are in order: 1) Simple statements are terminated by a semi-colon. Thus, C programs are free-form. 2) Whitespace may be inserted as desired to improve program readability. 3) Whitespace is required to separate identifiers, keywords, and constants which would otherwise be contiguous. 4) While comments (delimited by "/*" and "*/") are not strictly statements, it is noteworthy that in Multics C, comments do not nest. MTB-647 Multics Technical Bulletin C Compiler Spec. 11. Compiler Directives This section defines the directives acceptable to the pre-processor facility. The pre-processor is capable of text and macro substitution, conditional compilation, and inclusion of other source files into the compilation unit. Lines beginning with the character, "#", are considered directives for this facility. They are not subject to scoping rules; their effects last from their first use to the end of the compiled unit. 11.1. #define This has the same format as in K&R. However, the contents of strings which are given as part of the #define are examined for the presence of formal parameters to be substituted. For example, #define derogation(SLUR) "You SLUR, you!" 11.2. #undef The actions taken by this directive are identical to those in K&R. 11.3. #if The actions taken by this directive are identical to those in K&R. 11.4. #ifdef The actions taken by this directive are identical to those in K&R. 11.5. #ifndef The actions taken by this directive are identical to those in K&R. Multics Technical Bulletin MTB-647 C Compiler Spec. 11.6. #else The actions taken by this directive are identical to those in K&R. 11.7. #elseif NOTE: This compiler directive is an addition to those listed in K&R. The construction: #elseif constant_expression may be used in place of the sequence: #else #if constant_expression in nested #if constructions. The advantage to this is that only one #endif is required to close the selection. 11.8. #endif The actions taken by this directive are identical to those in K&R. 11.9. #line The actions taken by this directive are identical to those in K&R. 11.10. #include NOTE: The actions taken by this directive are different from those described by K&R. The directives: #include "filename" and #include <filename> both use the Multics standard translator search paths to locate the referenced files. No bypassing of the working directory takes place because that is under the control of the programmer. MTB-647 Multics Technical Bulletin C Compiler Spec. In addition, the assumed suffix for C include files is ".incl.c". Therefore, "common" files like "stdio.h" will be mapped to Multics segment names as "stdio.h.incl.c". 11.11. #equate NOTE: This compiler directive is an addition to those listed in K&R. The directive has the form: #equate identifier text where "identifier" is a valid C identifier and "text" is any sequence of characters. The directive states that references to "identifier" in the list of extern items for the program should be replaced by a reference to "text". For example, a C program containing the lines: #equate BIGGEST_SPACE sys_info$max_seg_size extern int BIGGEST_SPACE allows a program to reference directly the word containing the system-defined maximum segment size (in words). Multics Technical Bulletin MTB-647 C Compiler Spec. 12. C Programs on Multics The "usual"" C programming language presumes a static environment where the entire code segment to be run is linked together into a single unit before execution begins. In addition, the treatment of external variables also differs from the standard Multics paradigm. Finally, the "standard" run-time library has name conflicts with existing Multics commands and subroutines. 12.1. The C Program Model There are several points about the paradigm assumed in the execution-time model of C programs that need to be made explicit so their difference from or demands on the Multics model can be discerned. This is not a statement about the way that C programs on Multics must run. Only about the way they usually run on other implementations. A) All of the code involved in an application will be combined into an executable module prior to placing it in execution. This is at odds with the dynamic linking features of Multics. B) Once the executable module has been prepared, only the entrypoint to the main program (i.e. the "main" function) is known to the system which puts the module into execution. All other external definitions and references made by the various components are "inaccessible" when execution begins. C) All C functions are accessible by name when the modules are linked regardless of what object program they are contained in. This is in contrast to Multics' segname$entryname convention. D) No relationships exist between successive executions of the same or different executable images. This is not the normal Multics process view, although it has been implemented via the run_ facilities. E) There are no procedures in C. All subroutines are functions. F) All arguments are passed by value. Side-effects are produced by passing a pointer to the function argument which is to be modified. Multics argument lists allow parameters to be passed by reference as well. MTB-647 Multics Technical Bulletin C Compiler Spec. Various aspects of this model's and Multics' adaptation to each other will be discussed below. 12.2. Symbol Table Requirements The present Multics symbol table is inadequate to describe a C program with sufficient clarity. Therefore, additions and modifications to the information stored in the symbol section of the object segment will be required to support C programs. 12.2.1. Descriptor Types The following table gives the C data items which will have to be represented in the symbol table for debugging. Some C data types are already represented in other languages which Multics supports. Those that are not are identified by the marker, "--". C Object Code Standard Type or Explanation short int 01 real fixed-point binary short long int 02 real fixed-point binary long unsigned short int 34 real fixed-point binary long unsigned unsigned long int -- The definition of unsigned types in the standard descriptor type table does not allow the short unsigned integer to have a precision greater than 35 bits. Since C "unsigned short int" variables use all the bits in a machine word, they must be assigned to type 34. There is no descriptor type for a datum having a precision of 72 bits or greater. float 03 real floating-point binary short double 04 real floating-point binary long character 21 character string Multics Technical Bulletin MTB-647 C Compiler Spec. string -- There is presently no Multics datum defined which matches C strings in being delineated by a zero byte. pointer 13 pointer structure 17 structure union -- Although Algol-68 unions (type 62) are available, they are not applicable because their data structure always specifies the current contents of the union. In C, this is left to the programmer to keep track of. enum constant element -- Pascal enumerated list values (type 71) are restricted to non-negative integers. This is not true in C. enum variable -- The reasoning here is the same as for enumerated list constant elements. The corresponding Pascal data type (72) is inapplicable for C. Very few of these new descriptor types will appear in argument lists, however, due to the conversion rules. 12.2.2. Other Symbol Table Issues The following list contains other symbol representation issues which will have to be resolved before support for C programs can be considered complete: A) Symbol nodes for pointers may have to include the pointer's "base type" (e.g. pointer-to-character) in order to support correct pointer arithmetic in probe. B) A symbol node for a union should probably be represented as the root of a symbol sub-tree of all the possible constituents of the union. C) The symbol table will have to include a way to represent C typedefs resulting from the "tag" on structures and unions (for example) as separate objects. MTB-647 Multics Technical Bulletin C Compiler Spec. 12.3. Probe Changes Probe will have to be extended as expected to handle C expressions. Some of the needed extensions are: A) The C comparison operators "==" and "!=" will have to be allowed in designating conditional breakpoints. B) The C modulus operator, "%" will be allowed in expressions. C) Constants in octal and hexadecimal must be allowed in expressions. D) The C form of subscripts, "A[i][j]", must be allowed in requesting the values of variables and in assigned values. E) The address reference, "&A", will be allowed for obtaining the address of an item. F) Explicit dereferencing of a pointer via "*A" will be allowed. G) Probe should support arithmetic on C pointers. H) It should be possible to display the contents of a union in a programmer-chosen format. I) The probe builtin functions, length, maxlength, and substr, must be changed to work on C strings (and arrays). J) The C function, "sizeof", should probably be supported. K) Boolean tests, "if var" and "if !var" should work as expected as long as "var" can be cast into a int. 12.4. Memory Allocation Most C implementations place all data (auto, static, extern, and programmer-allocated) in a single contiguous address space. On Multics, this is possible, but not desirable. Therefore, the "standard" place will be used for each type of object: auto variables will be allocated in the stack; static in the linkage section; extern in the user_free_area (via *system variables); and programmer-allocated data in the user_free_area. Multics Technical Bulletin MTB-647 C Compiler Spec. The assignment of external and programmer-allocated storage to the user_free_area will make it possible for programmers to manage their allocated storage via set_fortran_common. It should be noted that this separation of storage may cause difficulties when importing programs which do comparisons of pointer values. This is because some applications take advantage of the implicit collection of all data into one unit even though it is explicitly warned against(1) except where "the pointers point to objects in the same array." Since Multics allocations always result in storage blocks contained wholly in a single segment, programs which observe this portability constraint will continue to work. 12.5. Use of an Operators Segment The C compiler will produce object segments which use the standard pl1_operators_ segment for call/save/return, data value conversions, intrinsic functions, etc. 12.6. Argument Lists C programs will use standard Multics calls. That is, they will produce a list of pointers to the argument values. Because of the call-by-value(2) requirement, temporary copies will be made of all non-expression arguments and the addresses of these will be placed in the argument list. Whenever possible, descriptor information will be included in the argument list. However, the utility of this information is in question. This is because the actual number of different types which can be passed as arguments(3) is rather small. Thus, while it seems desirable to pass the address of the first character of a string and to construct a descriptor for it when the copying ________________________________________ (1) K&R, Appendix A, Section 7.6, pg. 189 (2) K&R, Appendix A, Section 7.1, pgs. 185-186 (3) loc. cit. MTB-647 Multics Technical Bulletin C Compiler Spec. process determines its length, this cannot be done. The language rules require that the address of a (temporary) pointer to the first character of the string be placed in the argument list. The Multics descriptor for it cannot say it is anything other than that it is an unpacked pointer (at least not without adding many more descriptors). The following information attempts to illustrate the correspondence between a C data item and the value actually passed as the argument in a function invocation. To assist in this, the actual PL/I attribute list corresponding to the C argument value is given when possible. Otherwise, the value passed is described. When necessary, the reason for the set of attributes is also listed. C Argument: int PL/I Attributes: real fixed binary precision(35, 0) aligned Explanation: none C Argument: long int PL/I Attributes: real fixed binary precision(71, 0) aligned Explanation: none C Argument: unsigned int PL/I Attributes: bit(36) aligned Explanation: This could also be described as "real fixed binary precision(36, 0) unsigned aligned" in PL/I terms. However, this raises the spectre of known bugs with the representation of "unsigned" items in the present compiler. This particular representation at least gives the proper computational result when filtered through the "bin" builtin function into a signed variable of precision larger than 36 bits. C Argument: long unsigned int PL/I Attributes: bit(72) aligned Explanation: PL/I does not allow precisions of binary numbers to exceed 71 bits in length. Multics Technical Bulletin MTB-647 C Compiler Spec. C Argument: float PL/I Attributes: real float binary precision(63) aligned Explanation: C conversion rules for arguments require that all items of type float be converted to double. C Argument: double PL/I Attributes: real float binary precision(63) aligned Explanation: none C Argument: char PL/I Attributes: real fixed binary precision(35, 0) aligned Explanation: C conversion rules for arguments require that all items of type char be converted to int. C Argument: an array name PL/I Attributes: pointer aligned Explanation: An array name is treated as a pointer expression in C. The value of the pointer is the address of the first element of the array. C Argument: string PL/I Attributes: pointer aligned Explanation: Strings are arrays in C. The value of the pointer is the address of the leftmost character of the string. C Argument: pointer PL/I Attributes: pointer aligned Explanation: none C Argument: a structure name PL/I Attributes: The structure is passed as the value of the argument. However, care should be taken in trying to described actual arguments which contain unions. Explanation: A temporary copy will be made of MTB-647 Multics Technical Bulletin C Compiler Spec. the entire structure and the address of this copy will appear in the corresponding position of the actual argument list. To the receiver, this argument pointer will, of course, be invisible. C Argument: a field within a structure PL/I Attributes: bit(36) aligned Explanation: Bit fields are coerced into unsigned integers. Alternatively, the representation given above for unsigned int could have been used. However, for bits fields this representation seems more descriptive. The extracted bit field values are the rightmost bits of the string. C Argument: a union name PL/I Attributes: bit(n) unaligned Explanation: Unions are treated like structures. However, PL/I has no way of describing a union and C provides no way to indicate the current format of the data residing in a union. C Argument: enum PL/I Attributes: real fixed binary precision(35, 0) aligned Explanation: Instances of variables which are defined to contain enumerated values are treated as variables of type int. C Argument: enumerated constant PL/I Attributes: real fixed binary precision(35, 0) aligned Explanation: Constants appearing in an enumeration list are treated as being of type int. Multics Technical Bulletin MTB-647 C Compiler Spec. Some C implementations utilize this call-by-value mechanism in a different way. Copies of the arguments to be passed are catenated together into a structure-like format. The address of this structure is then passed as the argument pointer. The called program can then have declared only a single input argument as in char *my_arg; which it manipulates to access the various portions of the argument list values. From the definition above, Multics C does not support this programming style. 12.7. References to Library Routines As mentioned above, the assumption that C makes about execution is that the library routines have been physically incorporated into the executing program before execution begins. This is contrary to the normal Multics policy of having one copy of the library which is dynamically referenced by all users. The proposed solution to this problem is to make some modifications to the Multics binder. The nature of the change(1) is to provide, as part of the binder's input, a list of external symbol name-pairs of the form segname_1$entryname_1 segname_2$entryname_2 The idea is that, after all the inputs have been examined for external symbol definitions, if there are any unresolved references to segname_1$entryname_1, they are to be replaced with references to segname_2$entryname_2. Thus, a name-pair entry like: fopen standard_C_library_$fopen would allow us to provide the C library as a unique object in Multics without forcing larger bound segments than necessary. Since it works only on unresolved symbols, C programmers will still be able to replace library routines in the manner they do now; by writing a function with that name into their program. ________________________________________ (1) The exact mechanism has not yet been defined. MTB-647 Multics Technical Bulletin C Compiler Spec. As an additional comment, while this addition is being proposed to accommodate the C model, I believe it will prove worthwhile in dealing with imported application systems whose organization makes similar assumptions about the run-time environment. It also helps resolve name conflicts when these applications originate on other systems. For example, it is common to find routines with names like "date" and "time" being called by imported programs. It would be very convenient in the management of this importation to be able to say date MVS_library_$julian_date thereby assuring that the application would not now unexpectedly transfer to the Multics system's "date" command. 12.8. Function Name Resolution The binding process on Multics is another area where there are subtle differences from "conventional" usage vis-a-vis "linkage editing". The C environment (and many other systems, regardless of language) disregard the name of the object file being used as input, and concentrate instead on the external entry(1) names defined and referenced. In Multics terms, this means that a reference to the entrypoint "bar"(2) should be satisfied by the entrypoint which Multics knows as "foo$bar", at least as far as the binder is concerned. This presently is not possible, but is another area where a binder change would not only make C programmers more comfortable, but would probably have benefits when applications software is imported to Multics from more conventional systems. ________________________________________ (1) in the Multics sense of being visible from outside the segment (e.g. operands of an ALM "segdef" or "entry" pseudo-op) (2) implicitly transformed by Multics into "bar$bar" Multics Technical Bulletin MTB-647 C Compiler Spec. 13. Run-Time Library Definition This section defines the minimum set of library routines to be made available with the compiler. It also attempts to define the nature of the structures used by programs desiring to communicate with or manipulate the Multics run-time environment (e.g. files). 13.1. Input & Output All input and output to C programs (except that done by direct reference to Multics virtual memory using the #equate directive) is done using library functions. It should be stressed that these functions are among the most machine dependent and thus are most likely to differ among implementations of C on various machines. They depend on various constants, macros, and typedefs specified in the include file, "stdio.h". A brief summary of some of the more important ones are given in the following table. Item Description FILE A typedef for a structure which contains information about the file from the run-time library point of view. It is not a Multics IOCB pointer, but does contain a reference to the IOCB which defines this file for Multics. BUFSIZ The maximum size of an i/o buffer in characters. STRSZ The maximum length of a string. NULL The defined constant value for a null pointer value. stdin The standard file identifiers for the stdout "default system" input, output, and error stderr files respectively. They are assigned the natural correspondence on Multics to user_input, user_output, and error_output. MTB-647 Multics Technical Bulletin C Compiler Spec. EOF This is an int value which cannot result from casting ANY character into an int. Since characters read are treated as unsigned, the customary value chosen by most implementations is -1. Thus, getchar() will return 0777 if the Multics character 777 is read, and 0777777777777 when end-of-file occurs. The following list of input and output functions presumes the data definitions given below in discussing the actions performed by each function. FILE *fp; /* A pointer to the structure defining the file */ char c1, c2; /* Characters to be sent or received */ int N; /* An integer send or receive length */ int status; /* A Multics system standard error code */ char *s1, *s2; /* Pointers to strings of characters to be sent or received */ 13.1.1. fopen Declaration: FILE *fopen(); Invocation: fp = fopen("filename", "mode"); As shown, the function returns a pointer to a structure describing the relevant data about the file. It takes two arguments, both strings. The first is an absolute or relative pathname of the file to be opened. The opening will be attempted via the vfile_ io module of Multics, using a "stream" mode. Multics Technical Bulletin MTB-647 C Compiler Spec. The second argument is a single character string designating the intended use for the file. Allowed values are "r" (read), "w" (write), and "a" (append). Using "r" will cause an attempt to open for "stream_input", otherwise, the attempt will be made to open the file in "stream_output". If the file does not exist, and it is being opened for writing, it will be created. If the file cannot be opened as requested, the value NULL will be returned. 13.1.2. fclose Declaration: void fclose(); Invocation: fclose(fp); The argument to "fclose" is always a file pointer. Files remain open until explicitly closed by the program or until forced closed by a "close_files" command or the termination of a run-unit. Closing a file which is not open is not an error. 13.1.3. getc Declaration: char getc(); Invocation: c = getc(fp); This function gets a single character from the file whose pointer is given as its argument. The file must be opened for stream input. When the input file is exhausted, this function returns an EOF character from each invocation. MTB-647 Multics Technical Bulletin C Compiler Spec. 13.1.4. putc Declaration: char putc(); Invocation: c1 = putc(c2, fp); The "putc" function writes the character given as its first argument to the file whose pointer is specified as its second argument. The file must be opened for stream output at the time of the invocation. The "putc" function returns as its value the character it sends to the file. 13.1.5. fgets Declaration: char *fgets(); Invocation: s2 = fgets(s1, N, fp); This function reads characters from the file whose pointer is given as the third argument. The second argument, N, tells how many characters to read. Characters are read until a newline (n) is encountered(1) or N-1 characters have been passed. The string terminator (0) is stored as the last character in the string given as the first argument. The result of the function is the value of the first argument. 13.1.6. fputs Declaration: void fputs(); Invocation: fputs(s1, fp); ________________________________________ (1) If the newline character stops the input, it is still stored as part of the characters read into the string. Multics Technical Bulletin MTB-647 C Compiler Spec. The first argument must be a pointer to s string of characters and the second is a pointer to a file structure. Characters are written from the string up to but not including the null character marking the end of the string. 13.1.7. printf Declaration: void printf(); Invocation: printf("fmt string", ... ); This function is used to convert a number of arguments (possibly none) from their internal representation to ASCII under control of a format string (given as the first argument). The converted value are written to the standard output file. The format controls are those defined by the reference document, pgs. 145-147. 13.1.8. fprintf Declaration: void fprintf(); Invocation: fprintf(fp, "fmt string", ... ); This function works like printf except that the resultant string is written to the file given as the first argument. The format control string is given as the second argument, and the data to be converted (if any) as the third and succeeding arguments. 13.1.9. sprintf Declaration: int sprintf(); Invocation: sprintf(s1, "fmt string", ... ); MTB-647 Multics Technical Bulletin C Compiler Spec. This function performs the conversion to ASCII in the manner of fprintf. However, the first argument designates a string where the result is to be placed rather than a file to which is to be written. No check is made to ensure that the target string, given as the first argument, is long enough to hold the result. 13.1.10. scanf Declaration: int scanf(); Invocation: scanf("fmt string", &arg1, ... ); This function is the input analog of fprintf. The first argument is a control string indicating how to interpret characters received from the standard input file. The remaining arguments are pointers to data values which will hold the converted values. The valid scanning control sequences are given in K&R, pgs. 148-149. The result of the function is the number of items which were successfully converted and assigned to items in the argument list. 13.1.11. fscanf Declaration: int fscanf(); Invocation: fscanf(fp, "fmt string", &arg1, ... ); This function works like scanf except that the first argument designates the file which is to be used as the input file. Multics Technical Bulletin MTB-647 C Compiler Spec. 13.1.12. sscanf Declaration: int sscanf(); Invocation: sscanf(s1, "fmt string", &arg1, ... ); This function works like fscanf except that the first argument designates a string which is to be used as the source of input characters, rather than a file. 13.1.13. rewind Declaration: void rewind(); Invocation: rewind(fp); This function resets the file position for the file whose pointer is given as it argument to the beginning of the file. 13.1.14. open_file Declaration: FILE *open_file(); Invocation: fp = open_file("Multics attach description", "Opening Mode"); As shown, the function returns a pointer to a structure describing the relevant data about the file. It takes two arguments, both strings. The first argument is a standard Multics attach description. The second is a standard Multics opening mode for the target switch. If the file cannot be opened as requested, a "FILE" structure will still be allocated and a pointer to it returned. The structure will contain the reason for the inability to open the file. MTB-647 Multics Technical Bulletin C Compiler Spec. 13.1.15. open_switch Declaration: FILE *open_switch(); Invocation: fp = open_switch("Multics io switchname" "Opening Mode"); This function performs like open_file except that the first argument is the name of an attached and unopened io switch, rather than an attach description. 13.1.16. attach_switch Declaration: int attach_switch(); Invocation: status = attach_switch("Multics io switchname" "Attach description"); This function attaches a Multics io switch with the given name and attach description. It returns zero if it successfully made the attachment and a standard error code otherwise. 13.1.17. detach_switch Declaration: int detach_switch(); Invocation: status = detach_switch("Multics io switchname"); This function detaches a Multics io switch with the given name. The switch must be closed for the detach to succeed. It returns zero if it successfully made the attachment and a standard error code otherwise. Multics Technical Bulletin MTB-647 C Compiler Spec. 13.1.18. fflush Declaration: void fflush(); Invocation: fflush(fp); Any output which is in the C file buffer but has not been sent to the associated Multics io switch is forced out. The file must be opened in an output mode. 13.2. String Manipulation This section describes the library functions available for string manipulation. In the discussion of the individual functions, the following definitions are assumed: char s1, s2, s3; /* Strings */ char c; /* A single character */ int M, N; /* Various character count */ 13.2.1. strcat Declaration: char *strcat(); Invocation: s3 = strcat(s1, s2); This function appends a copy of the string, s2, to the end of the string, s1. No check is made on the allocated length of s1; this is the responsibility of the programmer. The value returned by the function is the value of s1. MTB-647 Multics Technical Bulletin C Compiler Spec. 13.2.2. strncat Declaration: char *strncat(); Invocation: s3 = strncat(s1, s2, N); This function appends at most N characters from s2 to s1. If s2 is less than or equal to N characters in length, it behaves like "strcat". 13.2.3. strcmp Declaration: int strcmp(); Invocation: N = strcmp(s1, s2); The two strings are compared lexicographically. If s1 is greater than s2, the value returned is positive; if less, negative; and if equal, zero. 13.2.4. strncmp Declaration: int strncmp(); Invocation: M = strncmp(s1, s2, N); This works like "strcmp" except that no more than N characters from the front of s1 and s2 are compared. Multics Technical Bulletin MTB-647 C Compiler Spec. 13.2.5. strcpy Declaration: char *strcpy(); Invocation: s3 = strcpy(s1, s2); In this function, s2 is copied into s1. The copy ends when the last character of s2 has been moved. No check is made on the allocated length of s1. The function return value is the value of the first argument. 13.2.6. strncpy Declaration: char *strncpy(); Invocation: s3 = strncpy(s1, s2, N); This function copies exactly N characters from s2 into s1. If s2 is longer than N characters, no string terminator is stored in s1. If s2 is shorter than N characters, s1 is padded to N characters with trailing null characters until it is N characters long. The return value of the function is the value of the first argument. 13.2.7. strlen Declaration: int strlen(); Invocation: N = strlen(s1); The value of the function is the length (including the string terminator) of s1. MTB-647 Multics Technical Bulletin C Compiler Spec. 13.2.8. strchr Declaration: char *strchr(); Invocation: s2 = strchr(s1, c); The return value of the function is a pointer to the first occurrence of c in s1. If c does not occur in s1, the return value is a null pointer. 13.2.9. strrchr Declaration: char *strrchr(); Invocation: s2 = strrchr(s1, c); The return value of the function is a pointer to the last occurrence of c in s1. If c does not occur in s1, the return value is a null pointer. 13.3. Memory Allocation This section describes the library functions available for allocating and freeing blocks of memory. In the discussion of the individual functions, the following definitions are assumed: unsigned N, M; /* Sizes and amounts to be allocated */ char *loc, /* Address of allocated *oldloc; space */ Multics Technical Bulletin MTB-647 C Compiler Spec. 13.3.1. malloc Declaration: char *malloc(); Invocation: loc = malloc(N); The argument to malloc is the number of bytes which are to be allocated. It returns a pointer to a block of bytes at least N long. The return value also points to an address suitable for use with any data type. 13.3.2. free Declaration: void free(); Invocation: free(loc); This function returns the space previously allocated by malloc to the free storage pool. No guarantee is made about the value of the bits in the allocated block. 13.3.3. calloc Declaration: char *calloc(); Invocation: loc = calloc(N, M); This function works like malloc except that it returns a pointer to a block of space sufficient to hold N copies of size M. In addition, all bytes in the allocated block are guaranteed to be zero. MTB-647 Multics Technical Bulletin C Compiler Spec. 13.3.4. realloc Declaration: char *realloc(); Invocation: loc = realloc(oldloc, M); This function "resizes" the block of storage pointed to by its first argument to be the size given by its second argument. If the space is to be shrunk, bytes will be trimmed from the right end of the block. If the requested size is larger, the new block will have the old block's value stored left-justified in the new block padded with 0 bytes to fill out the new size. In no case, even when the block size does not need to be changed, should the program expect that loc = oldloc. 13.4. Mathematical Functions The following list of mathematical functions will be available in the run-time library. All of these routines take arguments of type "double" and returns "double" values as their result. Function Description abs(X) absolute value of X acos(X) arccosine of X in radians 0 <= acos(X) <= pi asin(X) arcsine of X in radians -(pi/2) <= asin(X) <= (pi/2) atan(X) arctangent of X in radians -(pi/2) < atan(X) < (pi/2) ceil(X) smallest integer value greater than or equal to X cos(X) cosine of X in radians cosd(X) cosine of X in degrees cosh(X) hyperbolic cosine of X exp(X) e ** X Multics Technical Bulletin MTB-647 C Compiler Spec. floor(X) largest integer value less than or equal to X log(X) natural logarithm of X log10(X) logarithm (base 10) of X log2(X) logarithm (base 2) of X sin(X) sine of X in radians sind(X) sine of X in degrees sinh(X) hyperbolic sine of X srqt(X) square root of X 0 <= X tan(X) tangent of X in radians tand(X) tangent of X in degrees tanh(X) hyperbolic tangent of X 13.5. Miscellaneous The following functions do not fit easily within the preceding classifications. Many of the functions listed here implicitly make programs dependent on the Multics environment and should be avoided in situations where portability is important. The following definitions are assumed in the discussion of these functions. long int tics; /* A counter for clock "tics" */ long int when; /* A date or time value */ int code; /* A Multics system standard error code */ char flag; /* A choice indicator */ char *msg; /* A pointer to a message string */ MTB-647 Multics Technical Bulletin C Compiler Spec. 13.5.1. clock Declaration: long int clock(); Invocation: tics = clock(); The return result is the number of microseconds since 0000 hours, 1 January 1901, GMT. 13.5.2. vclock Declaration: long int vclock(); Invocation: tics = vclock(); The result is the number of microseconds of virtual cpu time used by the process. 13.5.3. date Declaration: long int date(); Invocation: when = date(); The result is an integer value representing the current date in the form YYYYMMDD, where YYYY is the year within the century, MM is the month within the year, and DD is the day of the month. Multics Technical Bulletin MTB-647 C Compiler Spec. 13.5.4. time Declaration: long int time(); Invocation: when = time(); The result is an integer value giving the current time in the form HHMMSSFFFFFF where HH is the hour of the day (00-23), MM is the minute within the hour, SS is the second within the minute, and FFFFFF is the microsecond within the second. 13.5.5. exit Declaration: void exit(); Invocation: exit(code, msg, flag); This function forces a return to the caller of its "main" program. All arguments are optional. If code is zero, or the function is invoked without arguments, then control passes to the caller of the "main" program. If code has the value -1, then the Multics condition "command_abort" will be signalled. If code is not zero or -1, it is interpreted as a standard Multics error code, and a call is made on the system routine, sub_err_, passing the msg. In this case, "flag" may only take one of the values acceptable to sub_err_. MTB-647 Multics Technical Bulletin C Compiler Spec. 14. Open Issues This section contains unresolved, important issues related to the suitability, performance, or "look" of the Multics implementation of the C compiler and language. Many of them have come from reviewers of prior drafts of this document. They are listed here in no particular order. Your comments and concrete suggestions are welcome on these topics. 14.1. Use of Standard Operators A suggestion has been made that C programs not use the pl1_operators_ segment, but instead have a special one of their own. The reasons in support of this are: A) The pl1_operators_ segment is too hard to maintain and modify. B) The rules for PL/I arithmetic do not match those of C well enough to make its use profitable in object segments. One more tailored to C rules would allow more compact object segments. C) Given the tendency for C programs to contain many small functions and make heavy use of function calls during execution, the pl1_operators_ call/push/return sequence will be too slow. A more effective one could be written that takes advantage of C programming style. D) Additional efficiency may be gained by having the compiler recognize functions which are intrinsic(1) but implemented efficiently in the operators segment. The function "strcpy" is a good example of this. 14.2. Mismatch in System Calling Conventions There is no mechanism to define a function or subroutine external to the calling program which obeys "native" calling conventions: argument passing by-reference, use of descriptors, call-by-value through the use of expressions, etc. Multics FORTRAN provides this via the declaration: external foo descriptors ________________________________________ (1) An "intrinsic" function in this context is one which is part of the standard library supplied with the compiler. Multics Technical Bulletin MTB-647 C Compiler Spec. The Maclisp compiler also provides a "defpl1" facility to do a similar function in addition to providing data type conversion as part of the call. Several reviewers have asked for such an extension in the Multics implementation of C. 14.3. Unbound Programs and Name Resolution The design proposes that C programs will have their inter-function name resolution done by the binder. While this seems to mimic the approach on other systems which require link editing compiled programs into executable objects, it leaves stand-alone C programs on Multics in the lurch. The suggestion has been made that the binder name-resolution mechanism be implemented. In addition, this facility should also be added to the compiler (perhaps through the inclusion of a standard preamble containing #equate directives) as well. In this case, additional provisions must allow the redefinition of such names by explicit inclusion of the function in a source program. 14.4. Support for the Entry Keyword It has been proposed the Multics provide an extension to the language which allows the creation of multiple-entry functions via the "entry" keyword. This keyword is presently reserved(1) for future use in the reference language. 14.5. Linker Support for the MAIN Entrypoint When an external reference is made to routine "foo" on Multics, the linker maps that into a reference to a segment whose name is "foo". Having found the segment, it then looks to see if there is an entrypoint in that segment called "foo". If there is one, execution begins at that entrypoint. In deference to languages like FORTRAN which have "main programs", if the linker cannot find an entrypoint named "foo", it will look for one called "main_". The FORTRAN compiler creates such an entrypoint for main programs to indicate the point to begin execution. ________________________________________ (1) K&R, Appendix A; Section 2.3, Keywords; Pg. 180 MTB-647 Multics Technical Bulletin C Compiler Spec. The issue in this case is is to decide among the following possibilities: A) The C compiler should translate any function defined as "main", the reserved keyword, into an entrypoint in the object segment called "main_". There will have to be an additional keyword reserved, "main_"; but, the linker does not have to be changed. B) The C compiler should add an additional entrypoint, called "main_", to any object segment it finds which contains a definition for "main". The "main" entrypoint will also appear an an external symbol; both will cause execution to begin at the same point in the compiled code. The linker will not have to be changed in this case; the "main_" keyword must be reserved. C) The linker should be changed to additionally look for the entrypoint "main" in the object segment before giving up and reporting failure. No additional keywords have to be reserved by the compiler. The linker change would be almost invisible to most users. 14.6. Content of the Library The functions defined earlier make up a minimal subset of a useful programming library for C. Other useful routines, and suggestions for other libraries, are especially welcome. 14.7. UNIX Environment Features Unlike many other languages, C was developed in conjunction with an operating system, UNIX.(1) A consequence of this is that many C programs are written with the (implicit?) assumption that certain facilities will be present. Which of these features should be built into the C compiler/run-time and which should be included in a larger enclosing environment is also an important open issue. Some of those which have been raised are included here. ________________________________________ (1) UNIX is a registered trademark of Bell Laboratories. It is commercially available under license from Western Electric. Multics Technical Bulletin MTB-647 C Compiler Spec. 14.7.1. Enclosing the Main Routine There is no way of automatically providing for pre-execution preparation of the running environment. This includes providing files for the standard devices: stdin, stdout, and stderr. 14.7.2. Device Nomenclature The present proposal provides no way to map between program-generated device strings commonly used by UNIX (e.g. /dev/tty6 or /dev/mem) and Multics counterparts. Some reviewers see this as a desirable feature of the run-time support. 14.7.3. Support for ARGC & ARGV The present proposal provides no way to identify C main programs as different from those written in any other language. Therefore, the suggested(1) cannot be used in Multics without some additional support. Whether this is to be handled by extending the command processor, providing an easy conversion sequence, or providing it as part of the encapsulating support for C programs remains undecided. ________________________________________ (1) K&R, Chapter 5, Pointers and Arrays; Section 5.11, Command-Line Arguments; pp.110-114.