Multics Technical Bulletin                                MTB-647
C Compiler Spec.

To:       Distribution

From:     Gregory A.  Baryza

Date:     23 January 1984

Subject:  Multics C Compiler Specification

1.  Abstract

This  MTB  discusses  the implementation  issues  surrounding the
installation  of  an  externally-developed compiler  for  the "C"
programming  language  on  Multics.   The  intent  is  to  have a
compiler  which accepts  a version  of the  language identical to
that already present on GCOS-III and the DPS6.  The compiler will
run  native  in  the  Multics  environment  and  produce standard
Multics object segments.

Comments  on the  nature and  content of  the supporting run-time
library for C are also included.

Comments on this MTB should be sent to the author -

     via Multics mail to:

        Baryza.Multics

     via posted mail to:

        Gregory A.  Baryza
        Honeywell Information Systems, Inc.
        Four Cambridge Center
        Cambridge, Massachusetts, U.S.A.   02142

     via telephone to:

        (HVN)-261-9315,
        (617)-492-9315

     via forum on System-M to:

        >user_dir_dir>Multics>Baryza>mtgs>C_Compiler_Spec
        (cc_spec)

________________________________________

Multics project  internal documentation; not to  be reproduced or
distributed outside the Multics project.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

                        TABLE OF CONTENTS

Section    Page  Subject
=======    ====  =======

1             i  Abstract
2             1  Preface
3             2  Introduction
3.1           2  . . Overall Goal
3.2           2  . . Motivation
3.3           3  . . Division of Labor
3.4           3  . . Reference Document for C
4             4  Identifiers
4.1           4  . . Characters Allowed in Identifiers
4.2           4  . . Length of Identifiers
4.3           4  . . Reserved Identifiers
5             5  Data Types
5.1           5  . . Basic Types
5.2           6  . . Derived Types
5.2.1         6  . . . . Pointers
5.2.1.1       6  . . . . . . Pointers to Functions
5.2.2         6  . . . . Aggregates
5.2.2.1       6  . . . . . . Arrays
5.2.2.1.1     7  . . . . . . . . Strings
5.2.2.2       7  . . . . . . Structures
5.2.2.2.1     7  . . . . . . . . Fields
5.2.2.3       8  . . . . . . Unions
5.2.3         8  . . . . Enumerations
5.3          10  . . Type Definitions
5.4          10  . . Storage Classes
6            11  Constants
6.1          11  . . Integers
6.1.1        11  . . . . Decimal
6.1.2        11  . . . . Octal
6.1.3        11  . . . . Hexadecimal
6.1.4        11  . . . . Representation of LONG Values
6.2          12  . . Floating Point
6.3          12  . . ASCII Characters
6.4          13  . . Strings
7            14  Expressions
8            15  Keywords
9            16  Data Type Conversion
9.1          16  . . Character to Integer
9.2          16  . . Integer to Character
9.3          16  . . Floating Point to Double
9.4          16  . . Double to Floating Point
9.5          16  . . Floating Point to Integer
9.6          17  . . Integer to Floating Point
9.7          17  . . Integer to Unsigned
9.8          17  . . Pointer to Integer


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

9.9          17  . . Integer to Pointer
9.10         17  . . The Standard Conversion Rules
10           19  Statements
11           20  Compiler Directives
11.1         20  . . #define
11.2         20  . . #undef
11.3         20  . . #if
11.4         20  . . #ifdef
11.5         21  . . #ifndef
11.6         21  . . #else
11.7         21  . . #elseif
11.8         21  . . #endif
11.9         21  . . #line
11.10        21  . . #include
11.11        22  . . #equate
12           23  C Programs on Multics
12.1         23  . . The C Program Model
12.2         24  . . Symbol Table Requirements
12.2.1       24  . . . . Descriptor Types
12.2.2       25  . . . . Other Symbol Table Issues
12.3         26  . . Probe Changes
12.4         26  . . Memory Allocation
12.5         27  . . Use of an Operators Segment
12.6         27  . . Argument Lists
12.7         31  . . References to Library Routines
12.8         32  . . Function Name Resolution
13           33  Run-Time Library Definition
13.1         33  . . Input & Output
13.1.1       34  . . . . fopen
13.1.2       35  . . . . fclose
13.1.3       35  . . . . getc
13.1.4       36  . . . . putc
13.1.5       36  . . . . fgets
13.1.6       36  . . . . fputs
13.1.7       37  . . . . printf
13.1.8       37  . . . . fprintf
13.1.9       37  . . . . sprintf
13.1.10      38  . . . . scanf
13.1.11      38  . . . . fscanf
13.1.12      39  . . . . sscanf
13.1.13      39  . . . . rewind
13.1.14      39  . . . . open_file
13.1.15      40  . . . . open_switch
13.1.16      40  . . . . attach_switch
13.1.17      40  . . . . detach_switch
13.1.18      41  . . . . fflush
13.2         41  . . String Manipulation
13.2.1       41  . . . . strcat
13.2.2       42  . . . . strncat
13.2.3       42  . . . . strcmp


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.2.4       42  . . . . strncmp
13.2.5       43  . . . . strcpy
13.2.6       43  . . . . strncpy
13.2.7       43  . . . . strlen
13.2.8       44  . . . . strchr
13.2.9       44  . . . . strrchr
13.3         44  . . Memory Allocation
13.3.1       44  . . . . malloc
13.3.2       45  . . . . free
13.3.3       45  . . . . calloc
13.3.4       46  . . . . realloc
13.4         46  . . Mathematical Functions
13.4         46  . . . . abs
13.4         46  . . . . acos
13.4         46  . . . . asin
13.4         46  . . . . atan
13.4         46  . . . . ceil
13.4         46  . . . . cos
13.4         46  . . . . cosd
13.4         46  . . . . cosh
13.4         46  . . . . exp
13.4         47  . . . . floor
13.4         47  . . . . log
13.4         47  . . . . log10
13.4         47  . . . . log2
13.4         47  . . . . sin
13.4         47  . . . . sind
13.4         47  . . . . sinh
13.4         47  . . . . srqt
13.4         47  . . . . tan
13.4         47  . . . . tand
13.4         47  . . . . tanh
13.5         47  . . Miscellaneous
13.5.1       48  . . . . clock
13.5.2       48  . . . . vclock
13.5.3       48  . . . . date
13.5.4       48  . . . . time
13.5.5       49  . . . . exit
14           50  Open Issues
14.1         50  . . Use of Standard Operators
14.2         50  . . Mismatch in System Calling Conventions
14.3         51  . . Unbound Programs and Name Resolution
14.4         51  . . Support for the Entry Keyword
14.5         51  . . Linker Support for the MAIN Entrypoint
14.6         52  . . Content of the Library
14.7         52  . . UNIX Environment Features
14.7.1       52  . . . . Enclosing the Main Routine
14.7.2       53  . . . . Device Nomenclature
14.7.3       53  . . . . Support for ARGC & ARGV


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

2.  Preface

Developing  the specification  for anything is  a difficult task.
The  trade-offs are  not always  easy and  seldom do  all go away
feeling satisfied.   The writing of this  specification for C has
run true to form.

The  C language  was invented in  1972 by Dennis  Ritchie of Bell
Laboratories.   Since then,  it has  become widely  accepted as a
major  programming language.   And, like all  major languages, it
exists  in  a  number  of  dialects  (sometimes  several,  subtly
incompatible versions for a given machine).

However,  the evolution  of C has  been strongly  affected by the
features  provided  by its  UNIX(1) host.   For better  or worse,
these  are also  features which  are found  in a  number of other
commercially available operating systems.   A large body of code,
from  the   UNIX  "shell"  to   some  sophisticated  applications
programs,  has  come to  expect  their presence.   Many  of these
programs  and  systems  are  viewed  as  useful  adjuncts  to the
facilities Multics already provides.

The crux  of the matter  is that some of  the expected "features"
are  missing  on  Multics  and  the  ways  of  doing  things  are
different.  Providing  the features and paths  is often difficult
or   undesirable.   This   specification  attempts   balance  the
expectations of  programs written elsewhere and  of those Multics
programmers  developing  codes for  use  only on  Multics.  Thus,
there are some "un-Multicious" thoughts  herein; but, I hope they
add to the overall environment rather than subtracting from it.

I  want  to  thank  those people  who  have  contributed  to this
document either in conversation or  as reviewers of early drafts:
Peter  Fraser, Steve  Herbst, Barry Margolin,  Kevin Martin, Dave
Mason, Tom  Oke, Ed Ranzenbach, Olin  Sibert, Melanie Weaver, and
Brian Westcott.

Gregory A. Baryza

________________________________________

(1) UNIX is  a registered trademark of  Bell Laboratories.  It is
    commercially available under license from Western Electric.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

3.  Introduction

3.1.  Overall Goal

The intent of this project is  to provide a compiler and run-time
library for the C language on Multics.  The compiler is to run as
a standard  Multics command and produce  Multics object segments,
listings, error messages,  and so on, in a  style consistent with
other Multics compilers.

The  run-time  library will  provide functions  common to  most C
implementations.   It will  interface between C  programs and the
services  provided  by Multics.   It should  be noted  that those
library  routines  which  provide  agency  services  to operating
system  functions for  specific systems  (e.g.  tasking  on UNIX)
will not necessarily be provided for Multics.

The mechanism by which this  will be achieved is the installation
of  an  existing  compiler, written  in  C and  developed  at the
University of Waterloo, on Multics.   No changes in the syntax or
semantics of the language definition for Multics are implied, nor
should any be assumed.

3.2.  Motivation

The  increasing  popularity  of  the  C  programming  language is
impossible to deny.  Many  mainframe manufacturers and most mini-
and   microcomputers   now  sport   C  compilers   (and  run-time
libraries).   As  a  consequence  much  systems  and applications
software is now  being written in C and,  ipso facto, support for
the language  is a "requirement"  for commercial, general-purpose
systems.    Within  the   present  Multics   community,  Standard
Telephone and Cable (U.K.)  has been the strongest proponent of C
on the system.

To  satisfy   this  need,  Honeywell  has   contracted  with  the
University  of  Waterloo to  produce  compilers for  GCOS-III and
GCOS-8,  and  for  the  DPS-6 product  line.   In  order  to take
advantage of this opportunity, MDC would like to utilize the work
already in  progress toward providing a  Multics C compiler.  The
compiler  is expected  to be source-language  compatible with the
other Honeywell offerings.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

3.3.  Division of Labor

Three parties will be involved  in the development of the product
for Multics:   the Universities of Waterloo  and Calgary, and the
Multics Development  Center.  The University of  Waterloo will be
responsible for the pre-processor, parser, and code generator for
the C  language.  Mainly, this  effort involves changes  to their
present compiler necessary to support  the interpretation of C on
Multics.   The  University  of  Calgary will  be  responsible for
packaging the output of the  code generator into Multics standard
object   segments,   complete   with  symbol   table   and  debug
information.  They will also provide the run-time library assumed
by  C  programs and  its connection  to Multics  facilities.  The
Multics   Development   Center  will   do  the   overall  project
coordination.   It  will  also  make the  changes  in  the system
software (probe, symbol table utilities, binder, etc.)  necessary
to support the product.

3.4.  Reference Document for C

Unless otherwise  noted, references to features  of the language,
section and page  numbers, and examples will be  presumed to come
from

     The C Programming Language(1)
     Kernighan, Brian W.  & Ritchie, Dennis M.
     Prentice-Hall (1978)
     Englewood Cliffs, New Jersey

In  particular,  Appendix  A  of this  document  claims  to  be a
reference  manual  for the  language.  Unfortunately,  it suffers
from some  ambiguities and omissions.   These will be  cited when
they are discussed.

One further point deserves  mention.  Appendix A frequently makes
reference to the H6000 version of  the compiler.  This is not the
Waterloo product.   It is an internally  developed compiler for C
which runs (primarily) on the Bell Laboratories' GCOS systems.

________________________________________

(1) This document is also commonly referred  to to as K&R, or the
    "White Book".


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

4.  Identifiers

Identifiers  in  C  are  used  to  name  variables  and  symbolic
constants, functions, structures, type definitions, etc.

4.1.  Characters Allowed in Identifiers

Identifiers  are  sequences  of characters  constructed  from the
following sets of items:

     -  upper- and lower-case letters

     -  digits

     -  the underscore character, "_"

The Multics implementation also  adheres to the conventions(1) in
distinguishing  upper and  lower case  letters as  different, and
requiring that the initial character of an identifier be either a
letter or an underscore character.

4.2.  Length of Identifiers

Identifiers  may  be  as  long  as  storage  requirements  in the
compiler permit.   However, the compiler will  use only the first
256 characters given to distinguish identifiers from one another.

NOTE:  This  is a deviation  from the K&R practice  of using only
the first 8 characters to distinguish between identifiers.

4.3.  Reserved Identifiers

Certain identifiers in C are  reserved as having special meaning.
Most of  them are keywords  and are listed  in a section  by that
title.  In  addition, the function named  "main" is designated as
the entrypoint at which the system is to begin execution of the C
program.

________________________________________

(1) K&R,  Chapter  2, Types,  Operators and  Expressions; Section
    2.1, Variable Names; pg. 33


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

5.  Data Types

The  C language  has a  small number  of fundamental  data types:
integers, floating  point numbers, and  characters.  In addition,
the declaration rules allow for the construction of a potentially
infinite   set   of   derived   types.   We   will   discuss  the
representation of each of these classes separately.

5.1.  Basic Types

C  allows  the  three  fundamental  types  described  above.   In
addition,  declarations  of  each  of  these  types  also  allows
"adjectives"  which  modify the  size  of the  basic type  or its
arithmetic performance.  These  adjectives are:  "short", "long",
and "unsigned".  The following table gives the various base types
and their equivalent representation in machine terms.  Equivalent
forms are listed together.

                              Width       Sign Bit   Boundary
     C Declaration           In Bits      Present    Alignment

     int                        36          Yes      Word
     short int

     long int                   72          Yes      Double Word

     unsigned int               36           No      Word
     unsigned short int
     short unsigned int
     unsigned short
     short unsigned

     unsigned long int          72           No      Double Word
     long unsigned int
     unsigned long
     long unsigned

     float                    8 & 28        Yes      Word

     double                   8 & 64        Yes      Double Word
     long float

     char                       9            No      Character
     unsigned char

The data types "float" and "double" also include a signed, 8-bit,
power-of-two exponent.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

NOTE:  While K&R(1) permit only one adjective to precede the base
type declaration, the compiler for Multics will allow both a size
and an  "unsigned" specifier.  They  may appear in  any order (as
shown above).

5.2.  Derived Types

The  derived  types of  C  are:  pointers  to  typed-objects (ala
Pascal and  ALGOL-68), and various kinds  of aggregates.  Each of
these will be discussed separately.

5.2.1.  Pointers

Pointers in C are always pointers  to objects of a specific type.
A C pointer  is represented as a Multics ITS  pointer.  Thus, a C
pointer is equivalent to a PL/I "pointer aligned".

5.2.1.1.  Pointers to Functions

Since  the  C  language  does  not  allow(2)  the  definition  of
functions  within other  functions, pointers to  functions do not
need   an  environment   pointer  as  part   of  their  reference
information.   Hence,  they  may   also  be  represented  as  ITS
pointers.   This  makes  them  equivalent to  the  result  of the
Multics PL/I builtin function, codeptr.

5.2.2.  Aggregates

The  Multics  C  compiler   supports  the  construction  of  data
aggregates.   Several  types are  possible:   arrays, structures,
unions.  Of course,  each of these aggregates may  be formed from
data  elements  which  are   themselves  either  basic  types  or
aggregates.

________________________________________

(1) K&R, Appendix A, Section 8.2, pg. 193

(2) K&R, Chapter 4, Functions and Program Structure; Section 4.8,
    Block Structure; pg. 81


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

5.2.2.1.  Arrays

C  arrays  correspond  directly  to  PL/I  arrays.   However, the
initial subscript for C arrays is zero so that the C declaration:

     int A[8];

is equivalent to the PL/I:

     dcl  A  real fixed binary (35, 0) aligned dimension (0 : 7);

5.2.2.1.1.  Strings

Like Pascal  and Ada, strings  in C are derived  types.  They are
represented  as arrays  of characters.   The last  element in the
string is the ASCII "NUL"  character (location 0/0) and forms its
delimiter.  Since the address of a C string is the address of its
first element,  they can be  overlayed with PL/I  based character
strings provided the length has  been found first by scanning for
the NUL character at the end.

5.2.2.2.  Structures

Structures in C are also directly analogous to those in PL/I.  In
particular, the same alignment rules apply and a C structure will
always be aligned to the most  strict boundary required of any of
its components.

5.2.2.2.1.  Fields

The  Multics  C  compiler   also  implements  the  definition  of
fields(1) within a machine word.   Fields on Multics are assigned
left-to-right.   For example,  a C "float"  variable is described
as:

     struct float_bin_real {  unsigned exponent : 8;
                              unsigned mantissa : 28; };

For  purposes of  computation, fields  in Multics  are treated as
unsigned integers.

________________________________________

(1) K&R, Appendix A, Section 8.5, pgs. 196-197


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

NOTE:  Multics deviates from the K&R specifications in allowing a
pointer to point to a bit field.   In this case, it points to the
leftmost bit of the field.   It is the programmers responsibility
to insure  that later use of  this pointer as a  locator does not
violate language boundary assumptions (i.e.   if you later use it
to  point to  a character,  it better  be aligned  to a character
boundary).

5.2.2.3.  Unions

Unions have no single counterpart in  PL/I.  An object which is a
union  may be  thought of  as a named  piece of  storage which is
large enough to contain any of  the objects defined to be part of
the union.  The syntax for defining unions is almost identical to
that of  structures, and component access  is accomplished in the
same way.

For example, the C declaration:

     union tag_name { int    a;
                      float  b;
                      char  *c; } var_name;

would  have to  be represented by  a series  of PL/I declarations
involving  aliasing  a properly  aligned  piece of  storage large
enough  to contain  the largest value.   In this  example, we are
talking about  72 bits aligned  on an even  word boundary because
"var_name.c" is a pointer.  Note that  it is not possible in PL/I
to  create a  sequence of  declarations which  preserves both the
storage overlaying and the naming structure of C unions.

5.2.3.  Enumerations

NOTE:  This data type is an  extension to the language defined by
K&R.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

Enumerated  data  types are  user  defined data  types  which are
represented as  though they were of  type "int".  The declaration
consists of  the keyword "enum"  followed by the  tag-name of the
type,  a  list  of  values,  and  an  optional  list  of variable
identifiers.  For example, the declaration:

     enum   day  {sun, mon, tue,
                  wed, thu, fri, sat}   start;

defines  the variable  "start" to be  of type  "day".  The values
which can be assigned to "start" are "sun", "mon", "tue", etc.

However,  unlike  some  languages(1)  which  do  not  specify the
mapping between enumerated values and their representations, this
version of C  does.  The first value in the  list is assigned the
value 0, the second 1, and so  on.  This version of C also allows
arithmetic on the enumerations.  This means that:

     start  =  sun + 2;        and            start  =  tue;     

are  equivalent.   In  fact,  they  are  also  equivalent  to the
sequence:

     #define  sun  0
     #define  mon  1
        . . .
     int      start;

     start  =  sun + 2;

In  addition, the  values specified  in the  enumeration list may
also be  optionally assigned values.  This  allows an enumeration
of the form:

     enum   day  {sun = -1, mon = 1, tue, wed,
                  thu, fri, sat = -1}        start;

where  "sun"  and  "sat" have  the  assigned value  -1,  "mon" is
assigned  the  value 1,  and  the values  of  "tue" to  "fri" are
assigned   successive  integers   starting  with   the  value  of
"mon + 1".  As shown overlaps in enumerated values are allowed.

________________________________________

(1) Ada, for instance


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

5.3.  Type Definitions

Type  definitions, which  allow the  user to  define synonyms for
existing  types  to  increase  portability  and  readability, are
allowed as in K&R.

NOTE:  The tag-name of an  enumerated definition may also be used
to  define additional  data elements of  that "type"  in the same
manner as with structures and unions.  Thus, using the enumerated
type "day",  we may define  a finish date as  another instance of
that type, and even initialize it, by:

     enum  day  finish  =  fri;

5.4.  Storage Classes

K&R  defines  four storage  classes:   auto, static,  extern, and
register.   The correspondence  with PL/I  storage classes  is as
follows:

     C Storage Class            Equivalent PL/I

     auto                       internal automatic

     static                     internal static

     extern                     external static

     register                   internal automatic

The Multics C compiler does not allow variables to be assigned to
registers.   For this  implementation, "register"  and "auto" are
taken as equivalent.   This does not mean, however,  that the "&"
operator can be used to take  the address of a register variable.
Such use is  non-portable and prohibited by the  compiler for the
sake of consistency with other implementations.

In addition, future versions of the  compiler may make use of the
register declaration, not as a way of dedicating a "fast" machine
register, but as  a way of denoting that  no pointer de-reference
can legitimately change the value of the variable.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

6.  Constants

The C  language allows several  different types of  constants for
numeric values and provides  representations for ASCII characters
and strings.

6.1.  Integers

With the  exception of those  explicitly designated as  long, all
integer constants are represented in storage as "int".

6.1.1.  Decimal

Decimal integers are written as  whole numbers (no decimal point)
and have no leading zeroes; examples are:

                 7     34359738637     100     13

6.1.2.  Octal

Octal numbers are also written as whole numbers.  They are formed
from the digits zero through seven  and are always preceded by at
least one leading zero as in

              06     0377777777777     0100     013

6.1.3.  Hexadecimal

Numbers in  hexadecimal are formed  from the set  of hexadecimal.
In  determining  the  value  of the  constant,  case  is ignored.
However,  to distinguish  the use  of hexadecimal,  the first two
characters of all such constant are required to be either "0x" or
"0X".  The following are examples of hexadecimal constants in C:

               0x6     0X7FfF     0xDead     0x0FF

6.1.4.  Representation of LONG Values

A "long" integer constant may be  represented in one of two ways.
Either the number may be obviously  too large to fit in an "int",
in which case the compiler  will automatically type it as "long".
Or,  the constant  may be suffixed  with an  upper- or lower-case
"L".  In the latter instance, the compiler will convert the value
to a "long" representation  (including double word alignment, for
example) regardless of the number of digits.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

6.2.  Floating Point

Number written with a decimal point  in them are considered to be
floating point  values.  They may also  (optionally) have leading
zeroes,  a decimal  fraction, and an  exponent part  in the usual
form.  Floating  point constants are always  represented as being
of type "double".

6.3.  ASCII Characters

An  ASCII character  constant is a  sequence of from  one to four
ASCII  characters  enclosed  in  single  quotes.   Each character
occupies one byte (9 bits) of storage.  Character constants which
contain less than four characters are stored right-justified in a
word and the high-order bit  of the first character is propagated
to the left  end of the word.  For  the Multics representation of
ASCII, this amounts to zero-fill.

A number of escape sequences are permitted in character constants
to ease representation of  certain characters.  The backslash ()
character introduces the sequence.  Allowed sequences are:

      Escape Sequence  Interpretation

            t         Horizontal tab

            \         Backslash

            '         Single quote

            "         Double quote

            n         Newline

            c         Carriage return

            f         Form feed

            b         Backspace

         d{d{d}}      A   one-  to   three-character  octal
                       constant whose value  is the value of
                       the character

NOTE:  ASCII character constants accepted  by this compiler are a
superset of those described by K&R.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

6.4.  Strings

A  string  constant  is a  sequence  of zero  or  more characters
enclosed by  double quote marks.  Escape  sequences are permitted
in character strings also.  As noted elsewhere, a string constant
is  treated as  an array  of single  characters terminated  by an
ASCII NUL byte (0).

It  is worth  mentioning at this  point that long  strings can be
continued  over  several  lines  of  source  code.   The sequence
"<nl>", where  <nl> is a  real newline character,  will do this.
When the  "<nl>" is encountered  while scanning a  string during
compilation, the  lexical analyzer discards  these two characters
and  all unescaped  leading whitespace from  the succeeding line.
This allows long strings to  be squeezed into the available space
and  also  permit  indentation  at  the  proper  place.   Escaped
whitespace  characters  are  included  "as  is".   The  following
examples illustrate this.

     Input Sequence           Interpretation

     "This is                "This is continued"
        continued"

     "This string            "This string has four blanks"
         has four 
          blanks"


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

7.  Expressions

This section deals with expressions in C.  The various operations
in an expression are evaluated according to a defined precedence.
Many operations  share the same  precedence.  In that  case, each
precedence class will group either left-to-right or right-to-left
depending on the class.

The standard(1)  rules for operator  precedence and associativity
do   not  completely   determine  the  order   of  evaluation  of
expressions, however.  In the case of "A+B", C says nothing about
whether A  or B should  be evaluated first.   In such situations,
the Multics compiler will evaluate each of the sub-expressions in
an  undetermined  order,  even  if there  are  side-effects  to a
certain order.   Parentheses, such as "(A)+B",  cannot be used to
force a  certain order in  such cases.  If a  particular order is
necessary, the  expression will have  to be broken  down into two
statements with the  result of the first stored  into a temporary
variable and used in the second.

Since functions in Multics C may  be declared to return no result
of  consequence  ("void"), such  functions  may only  be  used in
restricted cases.  In general, if  the function result would have
had to  be used in  further evaluating an  expression, the "void"
function invocation is prohibited.

With  the  preceding caveats,  the expressions  of the  Multics C
compiler are those of K&R.

________________________________________

(1) K&R, Appendix A, pgs. 185-192


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

8.  Keywords

There are a number of keywords  reserved by the C compiler.  They
cannot be used anywhere as identifiers.  In keeping with standard
Multics  practice,  these  keywords  will only  be  recognized in
lower-case.  Use  of identifiers which are  identical to keywords
except for  case should be  avoided as a  portability issue.  The
table of such keywords is

       auto          else          int           switch
       break         entry         long          typedef
       case        <> enum          register      union
       char          extern        return        unsigned
       continue      float         short       <> void
       default       for           sizeof        while
       do            goto          static
       double        if            struc

NOTE:   Entries  marked with  "<>"  are additions  to the  list of
keywords defined in K&R.  A synopsis of their usage is

     Keyword  Explanation

     enum     This keyword is  a declarator which introduces
              the  definition of  a user-defined  data item.
              The values which may "properly" be assigned to
              this  data  type  appear  in  the  list  which
              follows  the  tag-name  associated  with  this
              type.

     void     The  keyword, "void",  is used  in place  of a
              data-type   specifier    in   functions.    It
              indicates  that  the  value  returned  by  the
              function  is not  used and is  therefore of no
              importance.  Hence,  functions declared "void"
              may not  have their return  values assigned to
              anything, nor  may they appear  in expressions
              involving  additional  computation.   A "void"
              function is effectively a subroutine.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

9.  Data Type Conversion

9.1.  Character to Integer

The character object will be  converted to an integer value which
represents  the  character  objects  value  in  memory.   If  the
character  object is  not declared "unsigned",  sign extension of
the left-most character in the  character object will take place.
An  unsigned  character  object  will  be  copied  "as  is".  For
example,

     int    x;

     x = '777a';

will result in x having the  value, -415.  If x were additionally
declared  as  "unsigned", then  the assignment  will result  in x
being set to 261729.

9.2.  Integer to Character

When an  integer is converted  to a character, the  result is the
low-order  nine bits  of the integer  value.  All  other bits are
ignored.

9.3.  Floating Point to Double

The  mantissa of  the "float" value  is extended on  the right to
carry out all floating point arithmetic.

9.4.  Double to Floating Point

The  "double"  value  is  rounded when  the  target  precision is
"float".

9.5.  Floating Point to Integer

Floating  point   numbers  have  their   decimal  fraction  parts
truncated toward  zero.  If the  truncated value is  too large to
fit  in  the  target  integer,  continued  execution  will  yield
undefined results.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

9.6.  Integer to Floating Point

The conversion occurs in the expected way.  However, some loss of
accuracy may result if the  floating point target cannot hold all
the significant digits of the source exactly.

9.7.  Integer to Unsigned

The result  of this conversion  is the smallest  unsigned integer
congruent to the "int" source mod  2**N (where N is the number of
bits  in the  unsigned number).  The  effect of this  is that the
actual bit  pattern for the  number remains unchanged  (since the
Multics processor is a binary machine).

9.8.  Pointer to Integer

Pointers  will be  stored in "int"  data items  as Multics packed
pointers.   They will  be stored in  "long" items  as Multics ITS
pairs.   However, to  conform to  standard C  usage, Multics null
pointers will be stored as the value zero.

9.9.  Integer to Pointer

An "int"  being converted to a  pointer will be assumed  to be in
packed  pointer format;  a "long",  in ITS  format.  An  "int" or
"long" whose value is zero will  be converted into a Multics null
pointer (in the ring of execution).(1)

9.10.  The Standard Conversion Rules

Many binary operators cause conversion of their operands to other
types by default.  The conversions follow the rules given below.

NOTE:    Because  "unsigned   long  int"   is  allowed   in  this
implementation, these  rules differ slightly from  those given in
the reference(2) document.

________________________________________

(1) Thus,  a   Multics  null  pointer   (segno = -1,  wordno = 1,
    bitno = 0) and an integer value of zero are considered equal.
    One  will  be  converted  into  the  other  for  purposes  of
    assignment or during "casts".  They  will be converted into a
    common form for comparison.

(2) K&R, Appendix A, Section 6.6, pg.  184

MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

1.   Any operands of type "char" are converted to "int".

2.   Any operands of type "float" are converted to "double".

3.   If either operand is of type "double" the other is converted
     to  "double",  and  the  result  of  the  operation  will be
     "double".

4.   If either operand has an attribute of "long", the other will
     be  converted  to  "long";  and  the  result  will  have the
     attribute, "long".

5.   If either operand has an  attribute of "unsigned", the other
     will be  converted to "unsigned";  and the result  will have
     the attribute, "unsigned".

6.   Otherwise, the two operands are  converted to "int", and the
     result is "int".


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

10.  Statements

The  statements  accepted by  the  Multics C  compiler  are those
defined by  K&R.  In addition, the  following general remarks are
in order:

     1) Simple statements are terminated  by a semi-colon.  Thus,
        C programs are free-form.

     2) Whitespace may be inserted  as desired to improve program
        readability.

     3) Whitespace is required to separate identifiers, keywords,
        and constants which would otherwise be contiguous.

     4) While  comments  (delimited  by  "/*" and  "*/")  are not
        strictly statements, it is  noteworthy that in Multics C,
        comments do not nest.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

11.  Compiler Directives

This   section   defines   the  directives   acceptable   to  the
pre-processor facility.  The pre-processor is capable of text and
macro  substitution,  conditional compilation,  and  inclusion of
other source files into the compilation unit.

Lines   beginning  with   the  character,   "#",  are  considered
directives for  this facility.  They  are not subject  to scoping
rules; their effects last from their  first use to the end of the
compiled unit.

11.1.  #define

This  has the  same format as  in K&R.  However,  the contents of
strings which are  given as part of the  #define are examined for
the  presence  of  formal  parameters  to  be  substituted.   For
example,

     #define derogation(SLUR) "You SLUR, you!"

11.2.  #undef

The  actions taken  by this directive  are identical  to those in
K&R.

11.3.  #if

The  actions taken  by this directive  are identical  to those in
K&R.

11.4.  #ifdef

The  actions taken  by this directive  are identical  to those in
K&R.

11.5.  #ifndef

The  actions taken  by this directive  are identical  to those in
K&R.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

11.6.  #else

The  actions taken  by this directive  are identical  to those in
K&R.

11.7.  #elseif

NOTE:  This compiler directive is  an addition to those listed in
K&R.  The construction:

     #elseif constant_expression

may be used in place of the sequence:

     #else
     #if constant_expression

in nested #if constructions.  The  advantage to this is that only
one #endif is required to close the selection.

11.8.  #endif

The  actions taken  by this directive  are identical  to those in
K&R.

11.9.  #line

The  actions taken  by this directive  are identical  to those in
K&R.

11.10.  #include

NOTE:   The actions  taken by  this directive  are different from
those described by K&R.  The directives:

     #include "filename"

and

     #include <filename>

both use  the Multics standard translator  search paths to locate
the  referenced  files.  No  bypassing  of the  working directory
takes place because that is under the control of the programmer.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

In addition, the assumed suffix for C include files is ".incl.c".
Therefore,  "common"  files  like  "stdio.h"  will  be  mapped to
Multics segment names as "stdio.h.incl.c".

11.11.  #equate

NOTE:  This compiler directive is  an addition to those listed in
K&R.  The directive has the form:

     #equate identifier text

where  "identifier" is  a valid  C identifier  and "text"  is any
sequence of characters.  The  directive states that references to
"identifier" in the  list of extern items for  the program should
be replaced by a reference to "text".

For example, a C program containing the lines:

     #equate BIGGEST_SPACE sys_info$max_seg_size

     extern int BIGGEST_SPACE

allows a  program to reference  directly the word  containing the
system-defined maximum segment size (in words).


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

12.  C Programs on Multics

The "usual"" C programming language presumes a static environment
where the entire code segment to be run is linked together into a
single unit before execution  begins.  In addition, the treatment
of  external  variables also  differs  from the  standard Multics
paradigm.   Finally,  the  "standard" run-time  library  has name
conflicts with existing Multics commands and subroutines.

12.1.  The C Program Model

There  are  several  points  about the  paradigm  assumed  in the
execution-time model of C programs  that need to be made explicit
so their difference  from or demands on the  Multics model can be
discerned.  This is not a statement about the way that C programs
on  Multics must  run.  Only  about the  way they  usually run on
other implementations.

     A) All  of  the  code  involved in  an  application  will be
        combined into an executable module prior to placing it in
        execution.   This  is at  odds  with the  dynamic linking
        features of Multics.

     B) Once  the executable  module has been  prepared, only the
        entrypoint  to   the  main  program   (i.e.   the  "main"
        function) is  known to the  system which puts  the module
        into  execution.   All  other  external  definitions  and
        references   made   by   the   various   components   are
        "inaccessible" when execution begins.

     C) All C  functions are accessible by  name when the modules
        are  linked regardless  of what  object program  they are
        contained   in.    This  is   in  contrast   to  Multics'
        segname$entryname convention.

     D) No relationships  exist between successive  executions of
        the same or different executable images.  This is not the
        normal  Multics  process  view,   although  it  has  been
        implemented via the run_ facilities.

     E) There  are  no  procedures  in  C.   All  subroutines are
        functions.

     F) All  arguments  are  passed by  value.   Side-effects are
        produced  by passing  a pointer to  the function argument
        which  is to  be modified.  Multics  argument lists allow
        parameters to be passed by reference as well.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

Various aspects  of this model's and  Multics' adaptation to each
other will be discussed below.

12.2.  Symbol Table Requirements

The present  Multics symbol table  is inadequate to  describe a C
program  with  sufficient   clarity.   Therefore,  additions  and
modifications to the information stored  in the symbol section of
the object segment will be required to support C programs.

12.2.1.  Descriptor Types

The following table gives the C  data items which will have to be
represented in the symbol table for debugging.  Some C data types
are  already   represented  in  other   languages  which  Multics
supports.  Those that are not are identified by the marker, "--".

     C Object               Code Standard Type or Explanation

     short int               01  real fixed-point binary short

     long int                02  real fixed-point binary long

     unsigned short int      34  real  fixed-point   binary  long
                                 unsigned

     unsigned long int       --  The definition of unsigned types
                                 in the  standard descriptor type
                                 table  does not  allow the short
                                 unsigned   integer  to   have  a
                                 precision greater  than 35 bits.
                                 Since  C  "unsigned  short  int"
                                 variables use all  the bits in a
                                 machine   word,  they   must  be
                                 assigned  to type 34.   There is
                                 no  descriptor type  for a datum
                                 having a precision of 72 bits or
                                 greater.

     float                   03  real floating-point binary short

     double                  04  real floating-point binary long

     character               21  character string


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

     string                  --  There  is  presently  no Multics
                                 datum  defined  which  matches C
                                 strings in being delineated by a
                                 zero byte.

     pointer                 13  pointer

     structure               17  structure

     union                   --  Although  Algol-68  unions (type
                                 62) are available,  they are not
                                 applicable  because  their  data
                                 structure  always  specifies the
                                 current  contents of  the union.
                                 In  C,  this   is  left  to  the
                                 programmer to keep track of.

     enum constant element   --  Pascal  enumerated  list  values
                                 (type  71)   are  restricted  to
                                 non-negative integers.   This is
                                 not true in C.

     enum variable           --  The  reasoning here  is the same
                                 as for  enumerated list constant
                                 elements.    The   corresponding
                                 Pascal   data   type   (72)   is
                                 inapplicable for C.

Very few  of these new  descriptor types will  appear in argument
lists, however, due to the conversion rules.

12.2.2.  Other Symbol Table Issues

The  following list  contains other  symbol representation issues
which will have to be resolved  before support for C programs can
be considered complete:

     A) Symbol  nodes  for  pointers  may  have  to  include  the
        pointer's  "base  type"  (e.g.   pointer-to-character) in
        order to support correct pointer arithmetic in probe.

     B) A symbol node for a  union should probably be represented
        as  the root  of a  symbol sub-tree  of all  the possible
        constituents of the union.

     C) The symbol table will have  to include a way to represent
        C  typedefs resulting  from the  "tag" on  structures and
        unions (for example) as separate objects.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

12.3.  Probe Changes

Probe  will  have  to  be   extended  as  expected  to  handle  C
expressions.  Some of the needed extensions are:

     A) The C comparison operators "=="  and "!=" will have to be
        allowed in designating conditional breakpoints.

     B) The   C  modulus   operator,  "%"  will   be  allowed  in
        expressions.

     C) Constants  in octal  and hexadecimal  must be  allowed in
        expressions.

     D) The C  form of subscripts, "A[i][j]",  must be allowed in
        requesting  the  values  of  variables  and  in  assigned
        values.

     E) The  address   reference,  "&A",  will   be  allowed  for
        obtaining the address of an item.

     F) Explicit  dereferencing  of a  pointer  via "*A"  will be
        allowed.

     G) Probe should support arithmetic on C pointers.

     H) It should be possible to  display the contents of a union
        in a programmer-chosen format.

     I) The  probe  builtin  functions,  length,  maxlength,  and
        substr,  must  be  changed  to  work  on  C  strings (and
        arrays).

     J) The C function, "sizeof", should probably be supported.

     K) Boolean  tests,  "if var"  and "if  !var" should  work as
        expected as long as "var" can be cast into a int.

12.4.  Memory Allocation

Most C implementations place all  data (auto, static, extern, and
programmer-allocated) in  a single contiguous  address space.  On
Multics,  this is  possible, but  not desirable.   Therefore, the
"standard"  place will  be used  for each  type of  object:  auto
variables will be  allocated in the stack; static  in the linkage
section;  extern in  the user_free_area  (via *system variables);
and programmer-allocated data in the user_free_area.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

The  assignment of  external and  programmer-allocated storage to
the  user_free_area  will  make  it possible  for  programmers to
manage their allocated storage via set_fortran_common.

It  should be  noted that  this separation  of storage  may cause
difficulties  when  importing  programs which  do  comparisons of
pointer values.  This is because some applications take advantage
of the implicit collection of all  data into one unit even though
it  is explicitly  warned against(1)  except where  "the pointers
point to  objects in the same  array."  Since Multics allocations
always  result  in storage  blocks contained  wholly in  a single
segment, programs which observe  this portability constraint will
continue to work.

12.5.  Use of an Operators Segment

The  C  compiler  will  produce  object  segments  which  use the
standard pl1_operators_ segment  for call/save/return, data value
conversions, intrinsic functions, etc.

12.6.  Argument Lists

C programs will  use standard Multics calls.  That  is, they will
produce a  list of pointers  to the argument  values.  Because of
the call-by-value(2)  requirement, temporary copies  will be made
of all  non-expression arguments and the  addresses of these will
be placed in the argument list.

Whenever possible, descriptor information will be included in the
argument list.   However, the utility  of this information  is in
question.  This  is because the actual  number of different types
which can be passed as arguments(3) is rather small.  Thus, while
it seems desirable to pass the  address of the first character of
a string  and to construct  a descriptor for it  when the copying

________________________________________

(1) K&R, Appendix A, Section 7.6, pg. 189

(2) K&R, Appendix A, Section 7.1, pgs.  185-186

(3) loc. cit.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

process determines its length, this cannot be done.  The language
rules require  that the address  of a (temporary)  pointer to the
first  character of  the string be  placed in  the argument list.
The  Multics descriptor  for it cannot  say it  is anything other
than that it is an unpacked  pointer (at least not without adding
many more descriptors).

The   following   information    attempts   to   illustrate   the
correspondence  between  a C  data  item and  the  value actually
passed as  the argument in  a function invocation.   To assist in
this,  the  actual PL/I  attribute  list corresponding  to  the C
argument  value  is given  when  possible.  Otherwise,  the value
passed is described.   When necessary, the reason for  the set of
attributes is also listed.

     C Argument:         int
     PL/I Attributes:    real fixed  binary precision(35, 0)
                         aligned
     Explanation:        none

     C Argument:         long int
     PL/I Attributes:    real fixed  binary precision(71, 0)
                         aligned
     Explanation:        none

     C Argument:         unsigned int
     PL/I Attributes:    bit(36) aligned
     Explanation:        This  could  also  be  described as
                         "real fixed binary precision(36, 0)
                         unsigned  aligned"  in  PL/I terms.
                         However, this raises the spectre of
                         known bugs  with the representation
                         of "unsigned" items  in the present
                         compiler.       This     particular
                         representation  at least  gives the
                         proper  computational  result  when
                         filtered through  the "bin" builtin
                         function into a  signed variable of
                         precision larger than 36 bits.

     C Argument:         long unsigned int
     PL/I Attributes:    bit(72) aligned
     Explanation:        PL/I  does not  allow precisions of
                         binary numbers to exceed 71 bits in
                         length.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

     C Argument:         float
     PL/I Attributes:    real  float   binary  precision(63)
                         aligned
     Explanation:        C  conversion  rules  for arguments
                         require  that  all  items  of  type
                         float be converted to double.
     C Argument:         double
     PL/I Attributes:    real  float   binary  precision(63)
                         aligned
     Explanation:        none

     C Argument:         char
     PL/I Attributes:    real fixed  binary precision(35, 0)
                         aligned
     Explanation:        C  conversion  rules  for arguments
                         require that all items of type char
                         be converted to int.

     C Argument:         an array name
     PL/I Attributes:    pointer aligned
     Explanation:        An  array  name  is  treated  as  a
                         pointer expression in C.  The value
                         of  the pointer  is the  address of
                         the first element of the array.

     C Argument:         string
     PL/I Attributes:    pointer aligned
     Explanation:        Strings are arrays in C.  The value
                         of  the pointer  is the  address of
                         the   leftmost  character   of  the
                         string.

     C Argument:         pointer
     PL/I Attributes:    pointer aligned
     Explanation:        none

     C Argument:         a structure name
     PL/I Attributes:    The  structure  is  passed  as  the
                         value  of  the  argument.  However,
                         care should  be taken in  trying to
                         described  actual  arguments  which
                         contain unions.
     Explanation:        A  temporary copy  will be  made of


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

                         the   entire   structure   and  the
                         address of this copy will appear in
                         the  corresponding position  of the
                         actual   argument  list.    To  the
                         receiver,  this   argument  pointer
                         will, of course, be invisible.

     C Argument:         a field within a structure
     PL/I Attributes:    bit(36) aligned
     Explanation:        Bit   fields   are   coerced   into
                         unsigned  integers.  Alternatively,
                         the representation  given above for
                         unsigned int could  have been used.
                         However,   for  bits   fields  this
                         representation      seems      more
                         descriptive.   The   extracted  bit
                         field values are the rightmost bits
                         of the string.

     C Argument:         a union name
     PL/I Attributes:    bit(n) unaligned
     Explanation:        Unions are treated like structures.
                         However,   PL/I   has  no   way  of
                         describing  a union  and C provides
                         no  way  to  indicate  the  current
                         format  of the  data residing  in a
                         union.

     C Argument:         enum
     PL/I Attributes:    real fixed  binary precision(35, 0)
                         aligned
     Explanation:        Instances  of  variables  which are
                         defined   to   contain   enumerated
                         values are treated  as variables of
                         type int.

     C Argument:         enumerated constant
     PL/I Attributes:    real fixed  binary precision(35, 0)
                         aligned
     Explanation:        Constants    appearing     in    an
                         enumeration  list  are  treated  as
                         being of type int.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

Some C implementations utilize  this call-by-value mechanism in a
different  way.   Copies  of  the  arguments  to  be  passed  are
catenated together into a  structure-like format.  The address of
this  structure  is then  passed  as the  argument  pointer.  The
called  program  can  then  have  declared  only  a  single input
argument as in

     char  *my_arg;

which  it  manipulates  to  access the  various  portions  of the
argument list values.  From the  definition above, Multics C does
not support this programming style.

12.7.  References to Library Routines

As mentioned  above, the assumption that  C makes about execution
is  that the  library routines have  been physically incorporated
into  the  executing program  before  execution begins.   This is
contrary to the  normal Multics policy of having  one copy of the
library which is dynamically referenced by all users.

The  proposed   solution  to  this   problem  is  to   make  some
modifications to the Multics binder.  The nature of the change(1)
is to provide, as part of  the binder's input, a list of external
symbol name-pairs of the form

     segname_1$entryname_1  segname_2$entryname_2

The  idea is  that, after all  the inputs have  been examined for
external  symbol   definitions,  if  there   are  any  unresolved
references to segname_1$entryname_1, they are to be replaced with
references to segname_2$entryname_2.

Thus, a name-pair entry like:

     fopen  standard_C_library_$fopen

would allow  us to provide  the C library  as a unique  object in
Multics  without  forcing larger  bound segments  than necessary.
Since  it works  only on  unresolved symbols,  C programmers will
still be able  to replace library routines in  the manner they do
now; by writing a function with that name into their program.

________________________________________

(1) The exact mechanism has not yet been defined.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

As an  additional comment, while this  addition is being proposed
to accommodate the C model, I believe it will prove worthwhile in
dealing  with  imported  application  systems  whose organization
makes  similar  assumptions about  the run-time  environment.  It
also  helps  resolve  name   conflicts  when  these  applications
originate on other systems.

For example, it is common to find routines with names like "date"
and "time" being  called by imported programs.  It  would be very
convenient in  the management of  this importation to  be able to
say

     date  MVS_library_$julian_date

thereby assuring that the  application would not now unexpectedly
transfer to the Multics system's "date" command.

12.8.  Function Name Resolution

The binding  process on Multics  is another area  where there are
subtle differences  from "conventional" usage  vis-a-vis "linkage
editing".  The C environment  (and many other systems, regardless
of language) disregard the name of  the object file being used as
input,  and concentrate  instead on  the external  entry(1) names
defined  and  referenced.  In  Multics terms,  this means  that a
reference to  the entrypoint "bar"(2) should  be satisfied by the
entrypoint which Multics  knows as "foo$bar", at least  as far as
the binder is concerned.

This  presently  is not  possible,  but is  another area  where a
binder change would not only make C programmers more comfortable,
but  would probably  have benefits when  applications software is
imported to Multics from more conventional systems.

________________________________________

(1) in  the  Multics  sense  of being  visible  from  outside the
    segment  (e.g.   operands  of  an  ALM  "segdef"  or  "entry"
    pseudo-op)

(2) implicitly transformed by Multics into "bar$bar"


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.  Run-Time Library Definition

This section  defines the minimum  set of library  routines to be
made available with the compiler.  It also attempts to define the
nature of the structures used by programs desiring to communicate
with  or  manipulate  the   Multics  run-time  environment  (e.g.
files).

13.1.  Input & Output

All input  and output to  C programs (except that  done by direct
reference to Multics virtual  memory using the #equate directive)
is  done  using library  functions.  It  should be  stressed that
these functions are among the most machine dependent and thus are
most  likely  to differ  among  implementations of  C  on various
machines.  They depend on various constants, macros, and typedefs
specified  in the  include file,  "stdio.h".  A  brief summary of
some of the more important ones are given in the following table.

     Item          Description

     FILE          A typedef for  a structure which contains
                   information  about  the   file  from  the
                   run-time  library point  of view.   It is
                   not  a  Multics  IOCB  pointer,  but does
                   contain  a  reference to  the  IOCB which
                   defines this file for Multics.

     BUFSIZ        The  maximum  size  of an  i/o  buffer in
                   characters.

     STRSZ         The maximum length of a string.

     NULL          The  defined  constant value  for  a null
                   pointer value.

     stdin         The  standard  file  identifiers  for the
     stdout        "default system" input, output, and error
     stderr        files  respectively.   They  are assigned
                   the natural correspondence  on Multics to
                   user_input,        user_output,       and
                   error_output.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

     EOF           This is an int  value which cannot result
                   from casting  ANY character into  an int.
                   Since  characters  read  are  treated  as
                   unsigned,  the customary  value chosen by
                   most   implementations   is   -1.   Thus,
                   getchar() will return 0777 if the Multics
                   character 777 is read, and 0777777777777
                   when end-of-file occurs.

The  following list  of input  and output  functions presumes the
data definitions given below  in discussing the actions performed
by each function.

     FILE     *fp;         /* A  pointer to  the structure
                              defining the file */

     char     c1, c2;      /* Characters  to  be  sent  or
                              received */

     int      N;           /* An  integer send  or receive
                              length */

     int      status;      /* A  Multics  system  standard
                              error code */

     char     *s1, *s2;    /* Pointers   to   strings   of
                              characters  to  be  sent  or
                              received */

13.1.1.  fopen

Declaration:

     FILE  *fopen();

Invocation:

     fp  =  fopen("filename", "mode");

As  shown,  the  function  returns   a  pointer  to  a  structure
describing  the  relevant  data  about the  file.   It  takes two
arguments, both  strings.  The first  is an absolute  or relative
pathname of the file to be opened.  The opening will be attempted
via the vfile_ io module of Multics, using a "stream" mode.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

The second argument is a  single character string designating the
intended use  for the file.   Allowed values are  "r" (read), "w"
(write), and  "a" (append).  Using  "r" will cause  an attempt to
open for  "stream_input", otherwise, the attempt  will be made to
open the file in "stream_output".

If the file  does not exist, and it is  being opened for writing,
it will be  created.  If the file cannot  be opened as requested,
the value NULL will be returned.

13.1.2.  fclose

Declaration:

     void  fclose();

Invocation:

     fclose(fp);

The argument to "fclose" is  always a file pointer.  Files remain
open  until  explicitly closed  by  the program  or  until forced
closed  by  a  "close_files"  command  or  the  termination  of a
run-unit.  Closing a file which is not open is not an error.

13.1.3.  getc

Declaration:

     char  getc();

Invocation:

     c  =  getc(fp);

This function gets a single character from the file whose pointer
is  given as  its argument.  The  file must be  opened for stream
input.  When  the input file is  exhausted, this function returns
an EOF character from each invocation.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.1.4.  putc

Declaration:

     char  putc();

Invocation:

     c1  =  putc(c2, fp);

The  "putc"  function writes  the  character given  as  its first
argument  to the  file whose pointer  is specified  as its second
argument.  The file must be opened  for stream output at the time
of the invocation.  The "putc"  function returns as its value the
character it sends to the file.

13.1.5.  fgets

Declaration:

     char  *fgets();

Invocation:

     s2  =  fgets(s1, N, fp);

This  function reads  characters from  the file  whose pointer is
given as the  third argument.  The second argument,  N, tells how
many  characters to  read.  Characters  are read  until a newline
(n) is  encountered(1) or N-1 characters  have been passed.  The
string  terminator (0)  is stored as  the last  character in the
string given as  the first argument.  The result  of the function
is the value of the first argument.

13.1.6.  fputs

Declaration:

     void  fputs();

Invocation:

     fputs(s1, fp);

________________________________________

(1) If the newline character stops  the input, it is still stored
    as part of the characters read into the string.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

The first  argument must be  a pointer to s  string of characters
and the second is a pointer  to a file structure.  Characters are
written  from  the  string  up  to  but  not  including  the null
character marking the end of the string.

13.1.7.  printf

Declaration:

     void  printf();

Invocation:

     printf("fmt string", ... );

This function is used to  convert a number of arguments (possibly
none) from  their internal representation to  ASCII under control
of a format string (given  as the first argument).  The converted
value  are  written  to  the standard  output  file.   The format
controls   are   those   defined  by   the   reference  document,
pgs. 145-147.

13.1.8.  fprintf

Declaration:

     void  fprintf();

Invocation:

     fprintf(fp, "fmt string", ... );

This function works like printf  except that the resultant string
is written to  the file given as the  first argument.  The format
control string is  given as the second argument,  and the data to
be converted (if any) as the third and succeeding arguments.

13.1.9.  sprintf

Declaration:

     int  sprintf();

Invocation:

     sprintf(s1, "fmt string", ... );


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

This function performs  the conversion to ASCII in  the manner of
fprintf.  However,  the first argument designates  a string where
the result is to  be placed rather than a file to  which is to be
written.   No check  is made  to ensure  that the  target string,
given as the first argument, is long enough to hold the result.

13.1.10.  scanf

Declaration:

     int  scanf();

Invocation:

     scanf("fmt string", &arg1, ... );

This function is the input analog of fprintf.  The first argument
is  a  control  string  indicating  how  to  interpret characters
received from  the standard input file.   The remaining arguments
are pointers to data values which will hold the converted values.
The   valid  scanning   control  sequences  are   given  in  K&R,
pgs. 148-149.

The  result of  the function  is the  number of  items which were
successfully  converted  and assigned  to  items in  the argument
list.

13.1.11.  fscanf

Declaration:

     int  fscanf();

Invocation:

     fscanf(fp, "fmt string", &arg1, ... );

This  function works  like scanf  except that  the first argument
designates the file which is to be used as the input file.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.1.12.  sscanf

Declaration:

     int  sscanf();

Invocation:

     sscanf(s1, "fmt string", &arg1, ... );

This function  works like fscanf  except that the  first argument
designates a  string which is to  be used as the  source of input
characters, rather than a file.

13.1.13.  rewind

Declaration:

     void  rewind();

Invocation:

     rewind(fp);

This function resets the file position for the file whose pointer
is given as it argument to the beginning of the file.

13.1.14.  open_file

Declaration:

     FILE  *open_file();

Invocation:

     fp  =  open_file("Multics attach description",
                      "Opening Mode");

As  shown,  the  function  returns   a  pointer  to  a  structure
describing  the  relevant  data  about the  file.   It  takes two
arguments,  both  strings.   The  first  argument  is  a standard
Multics  attach description.   The second  is a  standard Multics
opening mode for the target switch.  If the file cannot be opened
as requested,  a "FILE" structure  will still be  allocated and a
pointer to  it returned.  The  structure will contain  the reason
for the inability to open the file.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.1.15.  open_switch

Declaration:

     FILE  *open_switch();

Invocation:

     fp  =  open_switch("Multics io switchname"
                        "Opening Mode");

This  function  performs  like  open_file except  that  the first
argument  is  the name  of  an attached  and unopened  io switch,
rather than an attach description.

13.1.16.  attach_switch

Declaration:

     int  attach_switch();

Invocation:

     status  =  attach_switch("Multics io switchname"
                              "Attach description");

This function  attaches a Multics  io switch with  the given name
and attach description.  It returns  zero if it successfully made
the attachment and a standard error code otherwise.

13.1.17.  detach_switch

Declaration:

     int  detach_switch();

Invocation:

     status  =  detach_switch("Multics io switchname");

This function detaches  a Multics io switch with  the given name.
The switch must be closed for  the detach to succeed.  It returns
zero if it successfully made  the attachment and a standard error
code otherwise.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.1.18.  fflush

Declaration:

     void  fflush();

Invocation:

     fflush(fp);

Any output which is in the C file buffer but has not been sent to
the associated Multics io switch is forced out.  The file must be
opened in an output mode.

13.2.  String Manipulation

This section describes the library functions available for string
manipulation.  In the discussion of the individual functions, the
following definitions are assumed:

     char     s1, s2, s3;  /* Strings */

     char     c;           /* A single character */

     int      M, N;        /* Various character count */

13.2.1.  strcat

Declaration:

     char  *strcat();

Invocation:

     s3  =  strcat(s1, s2);

This function appends a copy of the string, s2, to the end of the
string, s1.  No check is made on the allocated length of s1; this
is the  responsibility of the programmer.   The value returned by
the function is the value of s1.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.2.2.  strncat

Declaration:

     char  *strncat();

Invocation:

     s3  =  strncat(s1, s2, N);

This function appends at most N  characters from s2 to s1.  If s2
is less than or equal to  N characters in length, it behaves like
"strcat".

13.2.3.  strcmp

Declaration:

     int  strcmp();

Invocation:

     N  =  strcmp(s1, s2);

The two strings are compared lexicographically.  If s1 is greater
than s2, the  value returned is positive; if  less, negative; and
if equal, zero.

13.2.4.  strncmp

Declaration:

     int  strncmp();

Invocation:

     M  =  strncmp(s1, s2, N);

This works  like "strcmp" except  that no more  than N characters
from the front of s1 and s2 are compared.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.2.5.  strcpy

Declaration:

     char  *strcpy();

Invocation:

     s3  =  strcpy(s1, s2);

In this function,  s2 is copied into s1.  The  copy ends when the
last character  of s2 has  been moved.  No  check is made  on the
allocated length of  s1.  The function return value  is the value
of the first argument.

13.2.6.  strncpy

Declaration:

     char  *strncpy();

Invocation:

     s3  =  strncpy(s1, s2, N);

This function copies exactly N characters from s2 into s1.  If s2
is longer  than N characters,  no string terminator  is stored in
s1.   If  s2 is  shorter than  N  characters, s1  is padded  to N
characters with trailing null characters until it is N characters
long.  The return value of the function is the value of the first
argument.

13.2.7.  strlen

Declaration:

     int  strlen();

Invocation:

     N  =  strlen(s1);

The  value of  the function is  the length  (including the string
terminator) of s1.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.2.8.  strchr

Declaration:

     char  *strchr();

Invocation:

     s2  =  strchr(s1, c);

The  return  value of  the  function is  a  pointer to  the first
occurrence of  c in s1.   If c does  not occur in  s1, the return
value is a null pointer.

13.2.9.  strrchr

Declaration:

     char  *strrchr();

Invocation:

     s2  =  strrchr(s1, c);

The  return  value  of the  function  is  a pointer  to  the last
occurrence of  c in s1.   If c does  not occur in  s1, the return
value is a null pointer.

13.3.  Memory Allocation

This  section  describes  the  library  functions  available  for
allocating and  freeing blocks of  memory.  In the  discussion of
the individual functions, the following definitions are assumed:

     unsigned N, M;        /* Sizes  and   amounts  to  be
                              allocated */

     char     *loc,        /* Address     of     allocated
              *oldloc;        space */


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.3.1.  malloc

Declaration:

     char  *malloc();

Invocation:

     loc  =  malloc(N);

The argument  to malloc is  the number of  bytes which are  to be
allocated.  It returns  a pointer to a block of  bytes at least N
long.  The  return value also  points to an  address suitable for
use with any data type.

13.3.2.  free

Declaration:

     void  free();

Invocation:

     free(loc);

This function returns the space previously allocated by malloc to
the free storage  pool.  No guarantee is made  about the value of
the bits in the allocated block.

13.3.3.  calloc

Declaration:

     char  *calloc();

Invocation:

     loc  =  calloc(N, M);

This function works like malloc  except that it returns a pointer
to a  block of space sufficient  to hold N copies  of size M.  In
addition, all bytes  in the allocated block are  guaranteed to be
zero.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.3.4.  realloc

Declaration:

     char  *realloc();

Invocation:

     loc  =  realloc(oldloc, M);

This function  "resizes" the block  of storage pointed  to by its
first argument to  be the size given by  its second argument.  If
the space is  to be shrunk, bytes will be  trimmed from the right
end of the block.  If the requested size is larger, the new block
will have the old block's  value stored left-justified in the new
block padded with 0 bytes to fill out the new size.

In no case, even when the block size does not need to be changed,
should the program expect that loc = oldloc.

13.4.  Mathematical Functions

The following list of mathematical functions will be available in
the run-time  library.  All of  these routines take  arguments of
type "double" and returns "double" values as their result.

     Function  Description

     abs(X)    absolute value of X

     acos(X)   arccosine of X in radians
               0 <= acos(X) <= pi

     asin(X)   arcsine of X in radians
               -(pi/2) <= asin(X) <= (pi/2)

     atan(X)   arctangent of X in radians
               -(pi/2) < atan(X) < (pi/2)

     ceil(X)   smallest integer value  greater than or equal
               to X

     cos(X)    cosine of X in radians

     cosd(X)   cosine of X in degrees

     cosh(X)   hyperbolic cosine of X

     exp(X)    e ** X


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

     floor(X)  largest integer value less than or equal to X

     log(X)    natural logarithm of X

     log10(X)  logarithm (base 10) of X

     log2(X)   logarithm (base 2) of X

     sin(X)    sine of X in radians

     sind(X)   sine of X in degrees

     sinh(X)   hyperbolic sine of X

     srqt(X)   square root of X
               0 <= X

     tan(X)    tangent of X in radians

     tand(X)   tangent of X in degrees

     tanh(X)   hyperbolic tangent of X

13.5.  Miscellaneous

The following  functions do not  fit easily within  the preceding
classifications.   Many of  the functions  listed here implicitly
make programs dependent on the  Multics environment and should be
avoided  in  situations  where  portability  is  important.   The
following  definitions  are assumed  in  the discussion  of these
functions.

     long int tics;        /* A    counter    for    clock
                              "tics" */

     long int when;        /* A date or time value */

     int      code;        /* A  Multics  system  standard
                              error code */

     char     flag;        /* A choice indicator */

     char     *msg;        /* A   pointer  to   a  message
                              string */


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

13.5.1.  clock

Declaration:

     long int  clock();

Invocation:

     tics  =  clock();

The return result is the number of microseconds since 0000 hours,
1 January 1901, GMT.

13.5.2.  vclock

Declaration:

     long int  vclock();

Invocation:

     tics  =  vclock();

The result is the number of microseconds of virtual cpu time used
by the process.

13.5.3.  date

Declaration:

     long int  date();

Invocation:

     when  =  date();

The result is  an integer value representing the  current date in
the form YYYYMMDD, where YYYY is  the year within the century, MM
is the month within the year, and DD is the day of the month.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

13.5.4.  time

Declaration:

     long int  time();

Invocation:

     when  =  time();

The  result is  an integer value  giving the current  time in the
form HHMMSSFFFFFF where HH is the  hour of the day (00-23), MM is
the minute within  the hour, SS is the  second within the minute,
and FFFFFF is the microsecond within the second.

13.5.5.  exit

Declaration:

     void  exit();

Invocation:

     exit(code, msg, flag);

This  function  forces  a  return to  the  caller  of  its "main"
program.  All  arguments are optional.   If code is  zero, or the
function is invoked without arguments, then control passes to the
caller of the "main" program.

If   code  has   the  value   -1,  then   the  Multics  condition
"command_abort" will be signalled.

If  code  is not  zero or  -1,  it is  interpreted as  a standard
Multics  error code,  and a call  is made on  the system routine,
sub_err_, passing  the msg.  In  this case, "flag"  may only take
one of the values acceptable to sub_err_.


MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

14.  Open Issues

This section contains unresolved, important issues related to the
suitability, performance, or "look" of the Multics implementation
of  the C  compiler and  language.  Many  of them  have come from
reviewers of prior drafts of this document.  They are listed here
in no  particular order.  Your comments  and concrete suggestions
are welcome on these topics.

14.1.  Use of Standard Operators

A  suggestion  has  been  made   that  C  programs  not  use  the
pl1_operators_ segment,  but instead have a  special one of their
own.  The reasons in support of this are:

     A) The  pl1_operators_ segment  is too hard  to maintain and
        modify.

     B) The  rules for  PL/I arithmetic do  not match  those of C
        well  enough  to  make   its  use  profitable  in  object
        segments.  One more tailored to  C rules would allow more
        compact object segments.

     C) Given the  tendency for C programs  to contain many small
        functions  and make  heavy use  of function  calls during
        execution,  the pl1_operators_  call/push/return sequence
        will be too slow.  A  more effective one could be written
        that takes advantage of C programming style.

     D) Additional  efficiency  may  be   gained  by  having  the
        compiler recognize  functions which are  intrinsic(1) but
        implemented  efficiently in  the operators  segment.  The
        function "strcpy" is a good example of this.

14.2.  Mismatch in System Calling Conventions

There is no mechanism to define a function or subroutine external
to the calling program  which obeys "native" calling conventions:
argument passing by-reference,  use of descriptors, call-by-value
through  the use  of expressions, etc.   Multics FORTRAN provides
this via the declaration:

     external foo descriptors

________________________________________

(1) An "intrinsic" function in this  context is one which is part
    of the standard library supplied with the compiler.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

The Maclisp  compiler also provides  a "defpl1" facility  to do a
similar function in addition to providing data type conversion as
part  of  the call.   Several  reviewers have  asked for  such an
extension in the Multics implementation of C.

14.3.  Unbound Programs and Name Resolution

The   design   proposes   that   C  programs   will   have  their
inter-function  name resolution  done by the  binder.  While this
seems to mimic  the approach on other systems  which require link
editing  compiled  programs  into executable  objects,  it leaves
stand-alone C programs on Multics in the lurch.

The  suggestion  has been  made  that the  binder name-resolution
mechanism be implemented.  In addition, this facility should also
be  added to  the compiler  (perhaps through  the inclusion  of a
standard  preamble  containing #equate  directives) as  well.  In
this case,  additional provisions must allow  the redefinition of
such  names by  explicit inclusion  of the  function in  a source
program.

14.4.  Support for the Entry Keyword

It  has been  proposed the  Multics provide  an extension  to the
language  which allows  the creation  of multiple-entry functions
via the  "entry" keyword.  This keyword  is presently reserved(1)
for future use in the reference language.

14.5.  Linker Support for the MAIN Entrypoint

When an external  reference is made to routine  "foo" on Multics,
the linker maps that into a  reference to a segment whose name is
"foo".  Having found  the segment, it then looks  to see if there
is an entrypoint in that segment  called "foo".  If there is one,
execution begins at that entrypoint.

In  deference   to  languages  like  FORTRAN   which  have  "main
programs", if  the linker cannot find  an entrypoint named "foo",
it  will  look  for  one called  "main_".   The  FORTRAN compiler
creates  such  an entrypoint  for main  programs to  indicate the
point to begin execution.

________________________________________

(1) K&R, Appendix A; Section 2.3, Keywords; Pg.  180

MTB-647                                Multics Technical Bulletin
                                                 C Compiler Spec.

The  issue  in this  case  is is  to  decide among  the following
possibilities:

     A) The C  compiler should translate any  function defined as
        "main", the  reserved keyword, into an  entrypoint in the
        object segment called "main_".  There  will have to be an
        additional  keyword  reserved, "main_";  but,  the linker
        does not have to be changed.

     B) The  C  compiler  should  add  an  additional entrypoint,
        called  "main_",  to any  object  segment it  finds which
        contains a definition for  "main".  The "main" entrypoint
        will also  appear an an external  symbol; both will cause
        execution  to  begin at  the same  point in  the compiled
        code.   The linker  will not have  to be  changed in this
        case; the "main_" keyword must be reserved.

     C) The linker should be changed to additionally look for the
        entrypoint "main" in the  object segment before giving up
        and reporting failure.  No additional keywords have to be
        reserved  by the  compiler.  The  linker change  would be
        almost invisible to most users.

14.6.  Content of the Library

The  functions  defined earlier  make  up a  minimal subset  of a
useful  programming library  for C.   Other useful  routines, and
suggestions for other libraries, are especially welcome.

14.7.  UNIX Environment Features

Unlike many other languages, C  was developed in conjunction with
an operating system, UNIX.(1)  A consequence of this is that many
C  programs  are written  with  the (implicit?)   assumption that
certain  facilities  will be  present.   Which of  these features
should be built into the  C compiler/run-time and which should be
included in  a larger enclosing environment  is also an important
open issue.   Some of those  which have been  raised are included
here.

________________________________________

(1) UNIX is  a registered trademark of  Bell Laboratories.  It is
    commercially available under license from Western Electric.


Multics Technical Bulletin                                MTB-647
C Compiler Spec.

14.7.1.  Enclosing the Main Routine

There  is  no way  of  automatically providing  for pre-execution
preparation of the running  environment.  This includes providing
files for the standard devices:  stdin, stdout, and stderr.

14.7.2.  Device Nomenclature

The   present   proposal   provides   no  way   to   map  between
program-generated  device  strings  commonly used  by  UNIX (e.g.
/dev/tty6 or /dev/mem) and  Multics counterparts.  Some reviewers
see this as a desirable feature of the run-time support.

14.7.3.  Support for ARGC & ARGV

The present proposal provides no  way to identify C main programs
as   different  from   those  written  in   any  other  language.
Therefore,  the suggested(1)  cannot be  used in  Multics without
some  additional  support.   Whether  this is  to  be  handled by
extending  the  command processor,  providing an  easy conversion
sequence, or  providing it as  part of the  encapsulating support
for C programs remains undecided.

________________________________________

(1) K&R,   Chapter   5,  Pointers   and  Arrays;   Section  5.11,
    Command-Line Arguments; pp.110-114.