Multics > Library > Articles
26 Aug 2017

Reduction Compiler

History | People | Library | Sites | About

Gary Dixon [GCD]

Introduction

This note describes the history of the Multics reduction compiler.

Compiler Terminology

The Backus-Naur Form (BNF) is a method for expressing the constructs of a programming language. A series of BNF statements provides a set of rules to produce legal sentences in a given programming language that embody a particular meaning/intent in the language. Hence, sets of such rules are often called productions. They can be followed to produce a set of tokens (low-level elements of a language) that express higher-level constructs defined by the language.

Compilers have to perform the opposite task. Given a set of tokens, they have to match groups of tokens to programming language constructs, so they can determine their meaning. This is known as syntax analysis because tokens are being examined to see which match against the BNF syntax rules of the language.

Whenever a set of tokens match a syntax rule, the language definition tells what meaning to infer from the tokens. (Hopefully, the meaning inferred by the compiler is what the programmer intended.) This is known as semantic analysis because the purpose of compiling is to analyze the semantics (meaning or intent) embodied by a sequence of tokens, so the compiler can then generate code that implements that intent. As such, many low-level tokens are reduced to a few higher-level constructs defined by the programming language. The compiler can use reductions to match tokens against high-level language constructs, and thereby assign a meaning to those tokens. And for token combinations which could not be produced by the language, the compiler can diagnose an error.

Reduction statements therefore parse (analyze syntax of) tokens, looking for matches with language rules; and then to assign a language-specific meaning to (analyze semantics of) those tokens.

Components of a Compiler

A compiler tries to determine the meaning of a set of tokens (a program) expressed in a given programming language. Once meaning has been determined, it generates code to implement that meaning (intention of the programmer) by generating some object that expresses the meaning in another form (more easily understood by the computer). This might involve generating a new table entry in a database (as in the cv_ttf compiler generating new TTP entries); or might involve generating object code (instructions to the computer) for accepting some input data, processing it, and producing output data (as in PL/I compiler producing an object from a PL/I source program).

Steps of compiling are:

  1. LEXING: scanning a string of input characters, breaking them into slightly higher-level groups of characters called tokens. Each token is either a punctuation item accepted by the language; or a language keyword; or a literal defined by the language (e.g., a <number>); or some other symbol expressible in the language (e.g., a variable name in a dcl statement). The scanning phase is known as lexical analysis, because it looking for units (lexemes defined by the language rules) that may appear in the production rules of the language (its BNF syntax set).

    For example, the PL/I declaration: dcl cu_$arg_ptr entry(fixed bin, ptr, fixed bin(21), fixed bin(35)); would be lexed into tokens by separating punctuation characters from other words/symbols in the language, and eliminating unmeaningful whitespace separators. Each item below is a token.

          dcl cu_$arg_ptr entry ( fixed bin , ptr , fixed bin ( 21 ) , fixed bin ( 35 ) ) ;
        
  2. PARSING: matching tokens against constructs known in the programming language (BNF), in order to glean their meaning (the programmer's intent). As matches occur and meaning is determined, actions are performed to capture that meaning in a set of translator tables. These rules for matching may be expressed in a set of reduction statements that look for syntax (form) and meaning (semantics) in a stream of input tokens.

  3. CODE GENERATION: converting the token meaning (programmer's intentions captured by parsing) into instructions the computer can understand (an object program).

  4. CODE OPTIMIZATION: optional step to analyze the computer instructions looking for ways to make them perform more efficiently.

Multics reductions Compiler (rdc)

Input to rdc starts out as a segment containing a stream of characters. Most translators written in reduction language use the lex_string_ subroutine to do the lexing phase (to break this character stream into a set of tokens); some do it themselves (e.g., convert_date_to_binary_.rd).

Reduction statements given to rdc have information about syntax analysis (defined sequences of tokens), and semantic analysis (meaning of each token sequence). Each reduction has four parts:

  label	/ syntax_specification(s)		/ syntax_actions  & semantic_actions		/ next_label	\

Given a list of input tokens and a sequence of reduction statements such as above, the tokens are compared against the syntax_specification in each reduction until a match is found. When the tokens match a reduction syntax, then the syntax_actions and semantic_actions portion of the matching reduction statement are performed. These associate some meaning with the tokens (as defined by the language); and skip over the matching tokens, so the next set of input tokens may be examined. The next_label field identifies the next sequence of reduction statements to use in analyzing subsequent tokens.

AG92-06 MPM Commands, p 3-770 shows the BNF for a fictitious Tape Language. Subsequent pages show how reductions that translate this language are derived, with a completed set of reductions shown finally on p 3-784. While this example was intended mainly to demonstrate various aspects of the reductions language, the explanations attached try to explain each specific aspect. What may be daunting to some readers, however, is lack of the introductory information about compiling that I have tried to supply above.

History of the reductions Compiler

The term reductions is one I first heard in MIT's 6.251 "Programming Languages" class, taught by Professor Stu Madnick. I was intrigued by the elegant way that a few reductions could analyze meaning in an incoming stream of tokens. As my end-of-term project, I tried to write a PL/I program that would analyze a set of reductions, and convert them into a translator program. It was a huge task, and I ran out of time before completing the work. But the instructor liked my plan, and the code I completed; I received a grade of A on the project.

A few years later, I went to work at MIT Programming Development Office. When I started with Multics, many people (including me) were struggling with tasks involving conversion of some user-friendly language for expressing Multics data (e.g., TTF, or CMF, or PDT) into an actual data table that programs could understand. I decided to try again at writing a program to compile reductions into a translator. This resulted in lex_string_ (tokenizer), lex_error_ (translator-style error messages), reductions (the compiler), translator_temp_ (fast storage allocation for tokens) , etc. The translator I produced using reductions is library_descriptor_compiler (ldc).

The reductions (rdc) compiler accepts a simple reduction language that can be expressed as BNF. Therefore, I wrote the rdc program in this language of reductions; and solved the bootstrap problem by hand-coding the various syntax functions and action routines. Once I got the bootstrap version working, I used rdc to compile itself. And life was elegant (for a while).

Within a week of my publishing information about rdc, people were starting to use it for their work. Several programmers working on MIT's relational data tool (RDMS) asked for extensions to the rdc language. The documented features in rdc support translation of Type III (finite state machine) language forms. RDMS developers needed to translate a Type II (push-down automaton) language for one of its tools. The extensions were added (but not included in the documentation, because the MCRB thought they added too much complexity to the rdc documentation).

Eventually, the following rdc-generated tools were included in Multics releases:

    compdv.rd
    convert_date_to_binary_.rd
    cv_cmcs_station_ctl.rd
    cv_cmcs_terminal_ctl.rd
    cv_cmcs_tree_ctl.rd
    cv_cmf.rd
    cv_pmf.rd
    cv_prt_rqti.rd
    cv_rtmf.rd
    cv_ttf.rd
    dm_translate_system_config_.rd
    hcom_parse_.rd
    iod_tables_compiler.rd
    library_descriptor_compiler.rd
    mca_parse_.rd
    mtape_cv_apd.rd
    parse_bindfile_.rd
    parse_pnotice_info_.rd
    reduction_compiler_.rd
    rmdb_create_descriptor.rd
    tape_io.rd

I do not know how many Multics customers used rdc to write their own translators.

page created 21 Mar 2016 GCD