Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

To:  Distribution

From:  Andre Bensoussan

Date:  06/20/83

Subject:  Data Management:  Before Journal Manager Design

ABSTRACT

The  services provided  by the  Before Journal  Manager have been
specified in MTB 559, in terms of a set of primitives callable by
the rest of the system, with  a detailed description of what they
do and how to call them.

This document  describes in some  detail how these  primitives do
their job and how the data structures they use are organized.

The reader is supposed to be familiar with the following MTB's:

     - MTB-509:  Recovery overview
     - MTB-511:  File manager overview
     - MTB-559:  Before journal manager specifications.

Comments should be sent to the author:

via Multics Mail:
   Bensoussan.Multics on System M.

via US Mail:
   André Bensoussan
   Honeywell Information Systems, inc.
   575 Tech Square
   Cambridge, Massachusetts 02139

via telephone:
   (HVN) 261-9334, or
   (617) 492-9334
_________________________________________________________________

Multics  project  internal  working  documentation.   Not  to  be
reproduced or distributed outside the Multics project.



                            CONTENTS

                                                         Page

                 Abstract . . . . . . . . . . . . . . .     i
                 1 Introduction . . . . . . . . . . . .     1
                 2 Before Journal Logical Records . . .     1
                    2.1 Logical Record Types  . . . . .     1
                    2.2 Logical Record Contents and
                     Formats  . . . . . . . . . . . . .     3
                    2.3 Summary of the Logical Records
                     Formats  . . . . . . . . . . . . .     9
                 3 Before Journal File Characteristics     11
                 4 Before Journal File Format . . . . .    12
                    4.1 The before journal header in CI
                     zero . . . . . . . . . . . . . . .    12
                    4.2 The Control Interval Structure     13
                       4.2.1 The file manager ci_header
                        structure.  . . . . . . . . . .    13
                       4.2.2 The before journal bj_ci
                        structure.  . . . . . . . . . .    14
                    4.3 Elements  . . . . . . . . . . .    16
                    4.4 Logical Record Identifiers  . .    17
                 5 Logical Records Storage and
                  Retrieval Operations  . . . . . . . .    18
                 6 Logical vs Physical Journalization .    19
                    6.1 Logical Journalization  . . . .    19
                    6.2 Physical Journalization . . . .    20
                    6.3 File manager services . . . . .    21
                    6.4 Page Control Services . . . . .    22
                 7 Tables used by the Before Journal
                  Manager . . . . . . . . . . . . . . .    23
                    7.1 Per Process Information . . . .    23
                       7.1.1 The bj_ppt structure.  . .    23
                       7.1.2 The bj_ppte structure. . .    25
                    7.2 Per System Information  . . . .    26
                       7.2.1 The bj_pst structure.  . .    27
                       7.2.2 The bj_pste structure. . .    29
                       7.2.3 The bj_pn_table structure     35
                       7.2.4 The bj_check_in_table
                        Structure . . . . . . . . . . .    36
                    7.3 Per Transaction Information . .    38
                       7.3.1 The bj_txt structure.  . .    38
                       7.3.2 The bj_txte structure. . .    39
                 8 Table used by the Hardcore . . . . .    43

Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

1 INTRODUCTION

The before journal manager is organized into a set of primitives,
callable  as  a  set  of  before_journal_manager_$  entry points.
These primitives,  in order to  perform their job,  have to write
some information into the before journal or read some information
from it.   When writing, they  prepare the data to  be written in
what is  referred to as a  logical record; then they  call upon a
storage utility module of the  before journal manager to actually
store  the data  in the  journal.  Each  time the  storage module
stores  a logical  record into the  journal, it  returns a record
identifier  which  can be  used  later to  retrieve  that logical
record.   When  reading,  these  primitives call  again  upon the
storage  module,  with a  record  identifier; they  get  back the
logical record as  it was presented to the  storage module at the
time the record was stored.

2 BEFORE JOURNAL LOGICAL RECORDS

Logical records stored  in a before journal are  of various types
depending on their function in  the journal.  Each type of record
contains  specific  information  stored according  to  a specific
format.

2.1 Logical Record Types

The following logical record types have been defined:

o The "begin" type record, stored in the journal at the beginning
  of a transaction.

o The "committed" type  record, stored in the journal  at the end
  of  a  transaction to  indicate that  the transaction  has been
  committed.  The  transaction is in  fact committed at  the very
  instant the committed record physically appears on the journal.

o The "aborted" type record, stored in  the journal at the end of
  a  transaction  to  indicate  that  the  transaction  has  been
  aborted.   The  transaction  is  in fact  aborted  at  the very
  instant the abort record physically appears on the journal.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

o The "checkpoint" type record, stored in the journal to indicate
  that the transaction has performed a checkpoint and to keep (or
  point  to)   the  information  describing  the   state  of  the
  transaction at the time of the checkpoint.

o The "rolled_back"  type record, stored  in the journal  after a
  transaction has been rolled  back, partially or completely.  It
  indicates  up  to  which   checkpoint  the  rollback  has  been
  performed; it  also indicates that  all I/O's generated  by the
  rollback have been physically completed.

o The "before_image" type record, stored in the journal each time
  a transaction is about to  modify a protected file, to indicate
  what  file  is  about  to  be  modified  and  how  to  undo the
  modification if the transaction is to be rolled back.

o The "rollback_handler" type record,  stored in the journal each
  time a transaction is about to perform an action that could not
  be rolled  back using the standard  before image mechanism.  In
  theses  situations, the  transaction stores in  the journal the
  name of a procedure which would  have to be invoked at rollback
  time to undo the action; the record containing the name of this
  procedure is called a rollback handler record; it also contains
  any input information needed by the procedure to do its job.

o The "fm_rollback_handler" type record, used by the file manager
  as  a specially  tailored rollback  handler.  The  file manager
  could  have used  the general  form of  rollback handler record
  described   above.   For   ease  of   implementation,  and  for
  robustness reasons, it uses a special form of record.

o The  "fm_post_commit_handler"  type  record, used  by  the file
  manager to postpone an action until the transaction commits.  A
  typical  example  of  its usage  is  when a  protected  file is
  deleted in the middle of  a transaction.  Suppose that the file
  manager really  deletes the file.  If  the transaction needs to
  be aborted  a little later,  it would be  difficult to recreate
  the file as it was at  the beginning of the transaction, unless
  the entire file was saved, just  in case of an abort.  Instead,
  the file manager does a logical deletion of the file, and makes
  arrangement with  the commit mechanism to  get control when the
  transaction is committed, in order to turn the logical deletion
  into a real deletion, as a post_commit action.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

  There  may  be  several  post_commit  actions  awaiting  for  a
  transaction to  commit.  For each  of them, the  description of
  what is to be done is  kept in a fm_post_commit handler record.
  This list of records associated with a transaction must be such
  that  it can  be available  even after  a system  crash.  Since
  records in  the before journal have  this property, the journal
  has  been chosen  as the  place where  to keep  that list, even
  though the list is not needed for rollback purposes.

o The  "begin_commit"  type  record   is  used  to  indicate  the
  beginning of a commit  which has post_commit actions associated
  with it.  The  protocol is to write a  begin_commit mark in the
  journal,  then do  the post_commit  actions and  then write the
  committed  mark in  the journal.   As soon  as the begin_commit
  mark is  physically in the  journal, the transaction  cannot be
  aborted;  it  has  no  alternative  other  than  executing  the
  post_commit actions.  If the system crashes while executing the
  post_commit actions,  they will be re-executed  by the recovery
  system  and the  transaction will  be committed.   Committing a
  transaction  that has  no post_commit  actions pending  is done
  without a begin_commit record.

Additional  record types  may be defined  later, if  and when the
need for new types arises.

2.2 Logical Record Contents and Formats

All  before  journal  records  start with  a  header  that  has a
standard format regardless of the  record type; some record types
consist only  of the header;  some other record  types consist of
the  header  followed by  information with  a structure  which is
specific to the record type.

o  The logical record header

   The header has the following structure:

   dcl 1 bj_rec_hdr              aligned based,
       2 type                    char (4),
       2 tid                     bit (36),
       2 process_id              bit (36),
       2 prev_rec_id             bit (36),
       2 prev_rec_byte_size      fixed bin (24),
       2 tx_rec_no               fixed bin (35),
       2 n_txn                   fixed bin;


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       where:

       "type" is the  type of the record, i.e.,  one of the types
            described above.

       "tid"  is  the transaction  identifier of  the transaction
            which produced the record.

       "pid"  is  the  process  identifier of  the  process which
            produced the record.

       "prev_rec_id"  is  the record  identifier of  the previous
            record  produced  by the  same transaction  and still
            useful to the transaction.  This record identifier is
            used by the rollback  procedure to get before journal
            records  in reverse  chronological order  for a given
            transaction.   It  consists  of  a  [control interval
            number, slot number] pair.

       "prev_rec_byte_size"  is  the  number   of  bytes  of  the
            previous record produced by the same transaction.

       "tx_rec_no" is  the record number  within the transaction.
            This item  is redundant and  is used to  perform some
            consistency checking on the before journal.

       "n_txn"  is the  number of transactions  still in progress
            and  that  have stored  at least  one record  in this
            journal, at the time this particular record is stored
            in the journal.  This item  is used by recovery after
            crash   in  the   following  manner:    To  find  all
            unfinished transactions, recovery  after crash starts
            with  the  last record  written  in the  journal, and
            examines  all records  in the reverse  order of their
            production.  With the n_txn item in every record, the
            last  record  indicates  to   the  program  how  many
            unfinished  transactions there  are; as  soon as this
            number of unfinished transactions  has been found, it
            is  no longer  necessary to  examine the  rest of the
            journal.    Without   the    availability   of   this
            information, the journal would  have to be read until
            the origin is encountered.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

o  The "begin" record

   The begin record consists only of  the header.  It has no type
   specific information.

o  The "committed" record

   The committed record  consists only of the header.   It has no
   type specific information.

o  The "aborted" record

   The aborted type  record consists of the header  only.  It has
   no type specific information.

o  The "checkpoint" record

   The checkpoint record consists of  the header followed by some
   information  about the  checkpoint.  This  information will be
   defined later, since checkpoints will  not be available in the
   first release.

o  The "rolled_back" record

   The rolled_back type record has the following structure:

   dcl 1 bj_rolled_back_rec      aligned based,
       2 header                  like bj_rec_hdr,
       2 checkpoint_no           fixed bin (35),
       2 last_written_rec_id     bit (36);

       where:

       "header" is the logical record header.

       "checkpoint_no" is  the number of the  checkpoint to which
            the transaction was rolled back.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       "last_written_rec_id" is the record  id of the last record
            produced  by  this  transaction.    It  is  kept  for
            debugging purposes.

o  The "before_image" record

   The before image record has the following structure:

   dcl 1 bj_before_image         aligned based,
       2 header                  like bj_rec_hdr,
       2 fm_uid                  bit (36),
       2 fm_oid                  bit (36),
       2 ci_no                   fixed bin (35),
       2 n_parts                 fixed bin (17),
       2 image_len               fixed bin (24),
       2 part                    dim (refer
                                 (bj_before_image.n_parts)),
         3 byte_offset           fixed bin (24),
         3 byte_length           fixed bin (24),
       2 image                   char (refer
                                 (bj_before_image.image_len));

       where:

       "header" is the logical record header.

       "fm_uid" is  the file unique identifier  of the file about
            to be modified.  This identifier  will be used by the
            rollback procedure  in order to identify  the file on
            which  the modification  has to be  undone when using
            this  before  image record.   Since the  Multics file
            system does  not provide a search  facility to find a
            file by its unique identifier, the recovery mechanism
            (and more specifically the  file manager) maintains a
            conversion table from file unique identifiers to file
            pathnames,  for  all  files  that may  be  needed for
            rollback.

       "fm_oid" is the file opening  identifier of the file about
            to be modified.  It is used by the rollback procedure
            when executing in the same process as the transaction
            to rollback.

       "ci_no"  is  the control  interval  number of  the control
            interval which is about to be modified.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

       "n_parts" is the  number of parts about to  be modified in
            the control  interval.  In this context,  a part is a
            bit  string  defined by  its  byte offset  within the
            control interval and its length in number of bytes.

       "image_len"  is  the length  of  the record  in  number of
            bytes.

       "part" is an array, with one  entry for each part about to
            be modified.

       "byte_offset" is the byte offset, in the control interval,
            of the beginning of the part.

       "byte_length" is the number of bytes of the part.

       "image"   is   the   concatenation  of   the   bit  string
            representation  of all  parts to be  modified, in the
            order in which they are described in the part array.

o  The "rollback_handler" record

   This record has the following structure:

   dcl 1 bj_rollback_handler_rec      aligned based,
       2 header        like bj_rec_hdr,
       2 name_len      fixed bin (24),
       2 info_len      fixed bin (24),
       2 proc_name     char (refer
                       (bj_rollback_handler_rec.name_len)),
       2 info_bits     bit  (refer
                       (bj_rollback_handler_rec.info_len));

       where:

       "header"  is  the standard  header  of any  before journal
            logical record.

       "name_len"  is the  number of characters  of the proc_name
            item found below.

       "info_len"  is the  number of  bits of  the info_bits item
            found below.

       "proc_name" is name of the procedure to be invoked.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       "info_bits" is the bit  string representation of the input
            information that this procedure needs to do its job.

o The "fm_rollback_handler" record.

   This record has the following structure:

   dcl 1 bj_fm_handler_rec      aligned based,
       2 header                 like bj_rec_hdr,
       2 fm_uid                 bit (36),
       2 fm_oid                 bit (36),
       2 prev_fm_handler_rec_id bit (36),
       2 info_len               fixed bin,
       2 info_bytes             char (refer
                                (bj_fm_handler_rec.info_len));

       where:

       "header" is the standard record header.

       "fm_uid" is the  uid of the file for  which this record is
            relevant.

       "fm_oid" is the file opening id of that file.

       "prev_fm_handler_rec_id" is the record  id of the previous
            fm_handler_record in this transaction.

       "info_len"  is  the length,  in  number of  bytes,  of the
            info_bytes item below.

       "info_bytes" is the information needed by the file manager
            procedure to do its job.

o The "fm_post_commit_handler" record.

   This  record  has  the  same  structure  and  contents  as the
   fm_rollback_handler record described above.

o The "begin_commit" record.

   This  record  consists of  the  header only.   It has  no type
   specific information.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

2.3 Summary of the Logical Records Formats

The formats of  the various before journal record  types is given
in terms of PL/1 declarations.

dcl  1 bj_rec_hdr                     aligned based,
       2 type                         char (4),
       2 tid                          bit (36),
       2 process_id                   bit (36),
       2 prev_rec_id                  bit (36),
       2 prev_rec_byte_size           fixed bin (24),
       2 tx_rec_no                    fixed bin (35),
       2 n_txn                        fixed bin;

dcl  1 bj_committed_rec               aligned like bj_rec_hdr based;

dcl  1 bj_begin_commit_rec            aligned like bj_rec_hdr based;

dcl  1 bj_aborted_rec                 aligned like bj_rec_hdr based;

dcl  1 bj_rolled_back_rec             aligned based,
       2 header                       like bj_rec_hdr,
       2 checkpoint_no                fixed bin (35),
       2 last_rolled_back_rec_id      bit (36);

dcl  1 bj_rollback_handler_rec        aligned based,
       2 header                       like bj_rec_hdr,
       2 name_len                     fixed bin (24),
       2 info_len                     fixed bin (24),
       2 proc_name                    char (refer
                                      (bj_rollback_handler_rec.name_len)),
       2 info_bits                    bit (refer
                                      (bj_rollback_handler_rec.info_len));


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

dcl  1 bj_before_image                aligned based,
       2 header                       like bj_rec_hdr,
       2 fm_uid                       bit (36),
       2 fm_oid                       bit (36),
       2 ci_no                        fixed bin (35),
       2 n_parts                      fixed bin (17),
       2 image_len                    fixed bin (24),
       2 part                         dim (refer
                                      (bj_before_image.n_parts)),
         3 byte_offset                fixed bin (24),
         3 byte_length                fixed bin (24),
       2 image                        char (refer
                                      (bj_before_image.image_len));

dcl  1 bj_fm_handler_rec              aligned based,
       2 header                       like bj_rec_hdr,
       2 fm_uid                       bit (36),
       2 fm_oid                       bit (36),
       2 prev_fm_handler_rec_id       bit (36),
       2 info_len                     fixed bin,
       2 info_bytes                   char (refer
                                      (bj_fm_handler_rec.info_len));


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

3 BEFORE JOURNAL FILE CHARACTERISTICS

     Logical records of  the before journal are stored  in a data
management file, accessed through the file manager.  The contents
of the  file is to  be manipulated by the  before journal manager
only.  The type  of file used for the  before journal is referred
to as a "circular sequential" file.

     Logical  records  are  always  appended to  the  end  of the
journal.  They are never inserted in the middle, modified, moved,
grown,  shrunk  or deleted.   The basic  retrieval modes  are (a)
sequential in reverse order, used by recovery after crash to find
all  unfinished  transactions,  and  (b)  random  with  a  record
identifier, used by rollback.

     When  a transaction  terminates, all  before journal records
that it  produced become useless since  the transaction should no
longer be  rolled back.  As  a result, after a  certain time, the
first n records of the journal become useless.

     When  the  file is  full,  the first  n  consecutive useless
records  are "erased"  from the beginning  of the  file and their
space is made  available to continue to append  new records.  The
beginning  of the  file is  redefined to  be at  the first useful
record.  This is why the file  used for before journal is said to
be a circular file.

     Because of the special characteristics of Before Journals, a
new file  format and new  access methods, which  are tailored for
circular  sequential files  have been defined,  rather than using
those, more  general, defined for  the record and  index managers
(MTB 552).


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

4 BEFORE JOURNAL FILE FORMAT

     A before journal is stored  in a data management file, which
is made of Control Intervals (CI's).  It consists of a header for
the journal,  stored in CI  zero, followed by a  sequence of CI's
containing before journal logical records.  These CI's start with
a  CI header,  which is  very similar to  the header  used by the
record  and  index managers.   In  each CI  a variable  number of
variable length  elements can be  allocated, as for  CI's used by
the  record  and  index  managers;  but  the  structure  of these
elements has  been simplified for  the before journal  due to the
sequential nature of the journal.

4.1 The before journal header in CI zero

     The before journal  header is located in CI  zero, after the
file manager  standard ci_header.  Its structure  is identical to
the  "bj_pste" stucture  used by  the before  journal manager and
described  in  section 7.2.2  of this  memo.  Bj_pste  stands for
"before  journal  per system  table  entry".  The  before journal
manager uses  a per system  table with one entry  for each before
journal currently used in the system.  The entry for a journal is
referred to as the bj_pste for that journal.

     Some items found in the header are of permanent nature, such
as the journal  identifier, the CI size, the  maximum size of the
journal;  they  are  used  by   the  before  journal  manager  to
initialize  the "bj_pste"  structure when  the before  journal is
made operational for journalization.

     Some  other  items are  very  dynamic in  nature  and change
frequently while  journalization is being  performed.  The values
of these items  are always up to date  in the "bj_pste" structure
and are updated  in the header only periodically.   They are used
after a system crash, when  the "bj_pste" is no longer available,
as a starting point to find out  where the end of the journal was
at the time of the crash.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

4.2 The Control Interval Structure

     Like  all control  intervals, control intervals  of a before
journal  start  with  a  standard  file  manager  4-word  header,
referred to  as the "ci_header".   It is followed  by a structure
specific to before journal control intervals.

4.2.1 THE FILE MANAGER CI_HEADER STRUCTURE.

      The ci_header structure is  maintained by the file manager.
It is  described in detail in  MTB-554; it is given  here for the
reader's convenience.

   dcl 1 ci_header               aligned based,
       2 stamp,
         3 version               bit (9) unal,
         3 bj_idx                fixed bin (9) uns unal,
         3 time_modified         fixed bin (53) unal,
       2 id,
         3 uid                   bit (36),
         3 size_code             unal,
           4 exponent            fixed bin (6) uns,
           4 addon               fixed bin (3) uns,
         3 num                   fixed bin (27) uns unal;

       where:

       "ci_header.stamp.version"  is  a version  number  for this
            structure.

       "ci_header.stamp.bj_idx"  is  always zero  for unprotected
            files;  therefore it  is zero  for a  before journal.
            For  protected files,  it indicates the  index of the
            before journal  used to modify  the control interval.
            This  index  is  the  index  of  the  journal  in the
            per-system  table (See  section 7.2); it  is also the
            index of the entry, in the dm_journal hardcore table,
            which contains the time stamp used by page control to
            determine if  the control interval should  be held in
            main  memory or  if it  may be  written to  disk (See
            section 8).


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       "ci_header.time_modified" is  clock value at  the time the
            file manager did the last modification to the control
            interval.   Since  records are  only appended  to the
            journal,  control  intervals  in the  journal  have a
            monotonically  increasing time  value.  This property
            is  used by  recovery after crash  to determine where
            the end of the journal was at the time of the crash.

       "ci_header.id.uid"  is  the file  unique  identifier.  The
            file manager  automatically stores it  in all control
            intervals of the file.

       "ci_header.id.size_code" specifies the size of the control
            interval  in  number of  bytes, using  "exponent" and
            "addon", according to the following expression:  Size
            = (64 + 8 * addon) * 2 ** exponent.

       "ci_header.id.num" is  the control interval  number of the
            control  interval.  The  ci_header contains  the full
            identifier of the control  interval:  file uid and ci
            number.

4.2.2 THE BEFORE JOURNAL BJ_CI STRUCTURE.

   dcl 1 bj_ci                   based aligned,
       2 header1                 like ci_header,
       2 header2,
         3 layout_type           bit (36),
         3 first_rec_id          bit (36),
         3 n_slots               fixed bin (17) unal,
         3 first_is_contn        bit (1)        unal,
         3 last_is_contd         bit (1)        unal,
         3 pad                   bit (16)       unal,
         3 n_bi                  fixed bin (35),
         3 reserved              bit (36) dim (4),
       2 slot                    dim (1:1000),
         3 offset                fixed bin (18) uns unal,
         3 length                fixed bin (18) uns unal;

       where:


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

       "header1" is the file manager ci_header described above.

       "header2" is the before journal manager header.  Its items
            are described below.

       "layout_type"  is a  type indicating what  kind of control
            interval it  is.  The various  types used so  far are
            before journal type and  collection manager type.  By
            convention,  this  type  is  always  found  after the
            ci_header.

       "first_rec_id" is  the record id of  the logical record of
            which the  first element of this  control interval is
            the continuation.  If the first  element of the CI is
            not  the  continuation of  any  record, this  item is
            irrelevant and its value is undefined.

       "n_slots" is the number of  elements stored in the control
            interval.   It  is incremented  by  1 only  after the
            element has been stored.

       "first_is_contn" is a switch that, when ON, indicates that
            the  first element  is the  continuation of  a record
            that  starts  in a  previous control  interval.  This
            switch is  allocated in the same  word as the n_slots
            item and is carefuly set in an atomic store operation
            which  brings into  existence the  first element, the
            switch indicating it is the continuation of a record,
            and the record  id of the record in  which this first
            element belongs.

       "last_is_contd" is a switch  that, when ON, indicates that
            the last element of the control interval is part of a
            record  which  is  continued   in  the  next  control
            interval.   Like  the  first_is_contn  switch,  it is
            allocated in the same word as the n_slots item and it
            is set  in an atomic  operation at the  same time the
            n_slots value is incremeneted.

       "n_bi"  is  the number  of  before images  that  have been
            stored in this control interval.   It is used to keep
            a count of how many pages  may be held in main memory
            until the control interval is known to be on disk.

       "slot" is  the element postion table,  which describes the
            position and length of all elements stored in the CI.
            The  protocol  to  add  an  element  into  a  control
            interval in  an atomic manner is  as follows:  First,
            write  the contents  of the  element in  the CI, then
            store  its offset  and length  in the  next available


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

            slot entry; at this point  the element still does not
            exist.   If  the  element  is the  first  and  is the
            continuation of  the previous element,  set the value
            of  first_rec_id.  Finally,  prepare a  word with the
            incremented  value  of  n_slots  and  the  appropiate
            values  for  the   first_is_contn  and  last_is_contd
            switches  and  store  the  word  in  its  location in
            header2.

       "offset"  is  the  byte  offset of  the  beginning  of the
            element.  An  element may start at  any byte, it does
            not have to start at a word boundary.  This offset is
            relative to the "addressable  portion" of the control
            interval,  that  is,  relative  to  the  beginning of
            header2.

       "length" is the length of the element, in number of bytes.

4.3 Elements

     The element_position_table  describes all elements  that are
allocated in the CI.  An element in the file is identified by its
CI number (ci) and its index (ix) in the position table:

                    element_id = (ci, ix)

     The first element in a  control interval is the element with
ix equal  to 1.  The  last element in  a control interval  is the
element with the highest ix.

     The first element of the journal is the first element of the
first CI of the journal.  The last element of the journal is last
element in the last modified CI of the journal.

     Two elements are consecutive if (a)  they are in the same CI
and their  indices differ by one  or (b) one is the  last in a CI
and the other is the first in the next CI.

     When a new  element is allocated, it is  always allocated in
such a way as to become  the next element relative to the element
that was the last element.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

     An  element  is allocated  in  a CI  in  order to  contain a
logical record or part of it.   When a logical record needs to be
appended to  the journal, a  new element is allocated,  in such a
way that  it becomes the  last element of  the list and  the next
element to the one which was  last before the creation of the new
element.  Since an element is, by definition, entirely in one CI,
it is possible that the size  of the new allocated element is not
large enough to contain the  entire logical record.  In this case
the new  element is used to  store the first part  of the logical
record, and one or several new elements are allocated to hold the
rest  of the  logical record.   By convention,  if more  than one
element is needed  to store a logical record,  all these elements
are consecutive elements.

     Like  in  control intervals  used  by the  record  and index
managers, whose storage is managed by the Collection Manager (MTB
552), elements come in fours flavors, depending on whether or not
they  contain  a full  logical  record and  on  what part  of the
logical record they contain.  An element may contain:

  - The entire logical record or
  - Only the first part of a logical record or
  - Only the last part of a logical record or
  - Only (one of) the middle part(s) of a logical record.

     The  simplicity of  the operations performed  on the journal
have led to  the definition of an element  structure simpler than
the standard  structure used by  the Record Manager.   An element
contains  no header,  trailer or  pointer to  another element; it
only contains (all or some) bytes of a logical record.

4.4 Logical Record Identifiers

     A logical record identifier is the element identifier of the
element in which  it is stored.  If the  logical record is stored
in several elements,  its identifier is the element  in which its
first part is stored.  The  other parts are stored in consecutive
elements.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

5 LOGICAL RECORDS STORAGE AND RETRIEVAL OPERATIONS

     The  Before  Journal  Manager  has  a  utility  module which
provides  a  set  of  operations to  store  and  retrieve logical
records  in  and from  a  Before Journal.   This module  does not
understand the contents  of a logical record and  views it merely
as a bit string.

     This  storage  module and  its  primitives are  described in
detail in MTB-567:  "DM:  Before journal storage operations".  It
provides the following services:

1. Append  a logical  record at the  end of a  Before Journal and
   return the record id assigned to the appended record.

2. Flush  the  journal  to disk  up  to, and  including,  a given
   logical record, and wait for I/O completion.

3. Get the logical record specified by its record identifier from
   a Before Journal.

4. Get  the  logical  record  that  precedes  the  logical record
   specified  by its  record id from  a Before  Journal, with its
   record identifier.

5. Get  the last  logical record  that was  appended to  a Before
   Journal, with its record identifier.

6. Recycle that portion of the journal which is no longer useful.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

6 LOGICAL VS PHYSICAL JOURNALIZATION

     When  the  Before Journal  Manager is  called to  write some
information  in the  journal, say  it is  called to  journalize a
protected file  modification about to take  place, it appends the
appropriate  logical  record to  the journal  and returns  to its
caller with no guarantee that  the appended record is effectively
on disk.  Logical journalization is completed after the record is
appended to  the journal and  control is returned  to the caller.
Physical  journalization  is  completed when  the  logical record
physically appears on disk.

     The Transaction Manager must be aware of this distinction in
order  to correctly  implement the  commitment of  a transaction.
The  Before  Journal  Manager,  in  turn,  must  provide adequate
support  for  the  Transaction  Manager  to  deal  with  physical
journalization.

6.1 Logical Journalization

     When  the  Before Journal  Manager is  called to  store some
information  in the  Journal, it manufactures  the logical record
and  calls  the storage  utility primitive  bj_storage_append, to
append  it  to  the end  of  the journal.   The  append procedure
appends the record by storing it in a temporary buffer that looks
exactly like  the control interval  in which the record  is to be
stored.   When a  new record  needs to  be appended,  possibly by
another transaction, it is put in the temporary buffer, and so on
while the temporary buffer has enough room to accommodate the new
record.   For  all  these   records,  logical  journalization  is
considered completed,  even though they  have not been  stored in
the journal file yet.

     When  the  temporary buffer  is  full, the  append procedure
copies it in the journal file,  in the appropriate CI, by calling
the file manager at its "put" entry point.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

6.2 Physical Journalization

     The physical  journalization of a logical  record start when
the write to  disk request has been queued;  it is completed when
the disk  I/O is completed.  The  Before Journal Manager provides
the  "flush" primitive  which allows  the transaction  manager to
request that physical journalization be performed for all logical
records appended to the journal by a given transaction:

  call before_journal_manager_$flush_transaction (txn_id,...)

When  issuing this  call, the Transaction  Manager requests that,
for  all   records  appended  to  the   journal  by  the  current
transaction, physical journalization be started if necessary, and
completed.  What happens when the  flush procedure is executed is
described below.

The      before_journal_manager_$flush_transaction      procedure
determines the record  id of the last logical  record appended by
the transaction to the journal.  It finds it in the "bj_txte" for
this  transaction,  which  contains  before  journal  information
related to that transaction (See section 7.3).  It translates the
request  into  a call  to  its storage  utility module,  to cause
physical journalization of all records  of the journal, up to and
including the last record appended by the transaction:

  call bj_storage_flush (bj_oid, last_record_id,...)

The bj_storage_flush procedure keeps track of the last record for
which physical  journalization is completed, the  last record for
which physical  journalization has been started,  the last record
which was  copied from the  temporary buffer to  the journal file
and  the  last  record  still   in  the  temporary  buffer.   The
sequential nature of the journal makes this housekeeping easy.

By  looking at  the last_record_id passed  by the  caller, it can
determine which ones of the following  steps have to be taken, if
any, in order to satisfy the caller's request:

- Copy the temporary buffer  into the journal file, if necessary.
- Flush the necessary CI's.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

In  order to  flush the  journal, the  bj_storage_flush procedure
calls upon  a file manager  primitive provided for  that purpose,
which  causes physical  journalization of  a specified  number of
consecutive control  intervals starting from  a specified control
interval of a given file.

  call file_manager$flush_consecutive_ci (pf_oid, first_ci, n_ci)

The  file  manager,  in  the   first  implementation  on  top  of
multi-segment files, will call upon  page control to really issue
the  necessary  I/O's  and  wait for  them.   When "non-segmented
files"  are implemented,  the file  manager will  probably be the
"non-segmented  file"  manager  and  will be  able  to  issue I/O
requests and wait for them without going through page control.

6.3 File manager services

     The  before journal  manager expects  that the  file manager
will provide the following services:

1.  file_manager$flush_consecutive_ci (pf_oid, first_ci, n_ci)

     This  procedure   flushed  a  number   of  consecutive  CI's
specified by  "n_ci", starting at the  CI specified by "first_ci"
in the page file specified by  its opening id "pf_oid", and waits
until all CI's specified by  the caller are physically written to
disk.

2.  file_manager$open_by_uid (file_uid, file_opening_id)

     This  function  is  needed  by the  rollback  facility, when
executed in  a process other  that the process  which started the
transaction.  When a before image is recorded in the journal, the
file is identified by its  unique id; storing the entire pathname
of the  file in the  before image would be  too inefficient.  So,
the file manager is responsible  for keeping a uid-pathname table
with one entry for each file susceptible of being rolled back.

     The   open_by_uid   procedure   starts   by   searching  the
uid_pathname table for the  pathname associated with the file_uid
passed by the caller; then it uses the standard open mechanism.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

6.4 Page Control Services

     In order  to be able  to rollback after a  system crash with
ESD failure, it is necessary to enforce a protocol referred to as
the  "Write  Ahead  Log"  (WAL) protocol,  which  requires always
writing a  before image to  disk before the  associated data base
modification  is   itself  written  to  disk.    In  the  current
implementation,  all control  intervals are pages  written out by
page  control.  It  is therefore  necessary to  have page control
cooperate with the data management system in order to enforce the
WAL protocol.  This is described  in detail in MTB-564:  "Phasing
Page Control  and Before Journal".   A short description  is also
given in section 8.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

7 TABLES USED BY THE BEFORE JOURNAL MANAGER

      During  journalization,  the  before  journal  manager uses
several tables  in addition to the  journal itself.  These tables
contain before journal information of 3 categories:

      - Per process information
      - Per system information
      - Per transaction information

7.1 Per Process Information

     The  before journal  manager uses  a per  process table, the
pj_ppt,  which  consists  of a  header  followed by  an  array of
entries, one for  each journal open in this  process.  This table
is created and initialized at dm_per_process initialization.

     The bj_ppt table  is read and modified by  one process only;
no  lock is  necessary.  Modifications to  the table  are made by
carefully  written programs  in such a  way as  to appear atomic,
even if a process fails in the middle of an update operation.

     The  PL/1 declarations  for the  bj_ppt and  the bj_ppte are
given  below,  with  a short  description  for each  item  in the
structures.

7.1.1 THE BJ_PPT STRUCTURE.

   dcl 1 bj_ppt                       based aligned,
       2 version                      fixed bin,
       2 max_n_entries                fixed bin,
       2 n_entries_used               fixed bin,
       2 highest_ix_used              fixed bin,
       2 default_bj,
         3 user_set_oid               bit (36),
         3 last_opened_oid            bit (36),
       2 process_id                   bit (36),
       2 process_ix                   fixed bin,
       2 mod_list_area                (100) fixed bin (35),
       2 e                            dim (dm_system_data_$
                                      bj_max_n_journals refer
                                      (bj_ppt.max_n_entries))
                                      like bj_ppte;


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

    where:

    "version" is the version of the bj_ppt structure.

    "max_n_entries" is the maximum number  of "e" entries in this
         table.    It   is   initialized   with   the   value  of
         dm_system_data$bj_max_n_journals,    at   dm_per_process
         initialization, when the bj_ppt table is created.

    "n_entries_used" is the number of  entries used, that is, the
         number of journals opened by this process.

    "highest_ix_used" is the highest index of all used entries.

    "default_bj.user_set_oid" is the bj opening id of the journal
         established as the default journal in this process by an
         explicit call  to the entry point  set_default_bj in the
         before journal manager.

    "default_bj.last_opened_oid"  is the  opening id  of the last
         journal opened in this process, and still open.

    "process_id" is the process id of the current process.  It is
         set at dm_per_process initialization, and is used in any
         fast  paths where  it is  desirable to  save an external
         call to get_processid_ ( ).

    "bj_process_ix" is  the index assigned  to this process  in a
         system  table   of  the  before   journal  manager,  the
         bj_check_in_table.  This  table has 2  parts:  The first
         part consists of an array  of process_ids, with one used
         entry for every process registered in the before journal
         manager;  bj_process_ix  is  the index  assigned  to the
         process  in   this  array.   The  second   part  of  the
         bj_check_in_table  consists  of a  2  dimensional matrix
         where each element (p, j) is a one bit element, equal to
         1 if the journal with index  j is opened by process with
         index p (See section 7.2.4).


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

7.1.2 THE BJ_PPTE STRUCTURE.

   dcl 1 bj_ppte                      based aligned,

       2 version                      fixed bin,
       2 bj_uid                       bit (36),
       2 pf_oid                       bit (36),
       2 n_opening                    fixed bin,
       2 bj_pste_ptr                  ptr,
       2 open_time                    fixed bin (71);

    where:

    "version" is the version number of this structure.

    "bj_uid"  is  the  unique  identifier  of  the  journal.   By
         convention, it is the same as the file uid.

    "file_oid" is the opening id  assigned by the file manager to
         the data management file  used to implement the journal.
         This file_oid is how the file must be referred to by the
         before journal manager when  calling the file manager to
         perform a get, put or flush operation on the file.

    "n_opening"  is the  number of  times this  process requested
         this before journal to be  opened, by calling the "open"
         entry  point  of  the  before  journal  manager.   It is
         incremented by 1  each time before_journal_manager_$open
         is   called,   and   decremented    by   1   each   time
         before_journal_manager_$close   is   called.    When  it
         reaches  0,  the process  is  "checked out"  of  the per
         system  bj_check_in_table,  that is,  its  process_id is
         removed   from   the   table  of   process_ids   in  the
         bj_check_in_table,  and  the number  of  processes using
         this  journal is  decremented in the  bj_pste entry (See
         section 7.2.2).

    "bj_pste_ptr" is a pointer to the per system entry describing
         this journal, in the per system table bj_pst.

    "open_time"  is  the  time  this  journal  was  open  in this
         process.   It  is  used  to determine  what  the default
         before journal should be, when no default is in effect.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

7.2 Per System Information

     The  before journal  manager uses  a per  system table which
consists  of  a header  and  an array  of  entries, one  for each
journal opened by at least one process in the system.  This table
is created and initialized  at dm_per_system initialization.  The
table is referred to as the bj_pst,  and an entry in the table as
a  bj_pste.    Every  bj_pste  in  use   contains  system  shared
information  describing a  particular before  journal currenly in
use  by at  least one  process.  If  a given  journal is  open in
several  processes,  each process  has its  own bj_ppte,  but all
these bj_ppte  point to a  unique bj_pste for  this journal.  Two
bj_pste's never describe the same journal.

     The  bj_pst  and  bj_pste's  are  shared  by  all  processes
executing  in the  before journal manager.   Concurrent access to
the same information is prevented  by the use of exclusive locks.
These locks must be fast to  acquire and to release and they must
not be held until the end of the transaction.  The before journal
manager must be able to set  them and release them as required by
its  logic.   Fast  locks  will be  available  for  this purpose,
modeled  after the  locks provided by  "set_lock" but implemented
with actual notification rather than with the calender clock.

     Two classes of  operations need synchronization:  Operations
on the  header of the  table, to allocate  or free an  entry; and
operations on  a specific entry,  to perform journalization  on a
given journal.  A lock will be  associated with the header, and a
lock will be associated with each entry.

     Modifications  to the  header or  to individual  entries are
made  by carefully  written programs in  such a way  as to appear
atomic,  even  if a  process  fails in  the  middle of  an update
operation.

     The PL/1 declarations for  the bj_pst and bj_pste structures
are given  below, with a  short description for each  item in the
structure.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

7.2.1 THE BJ_PST STRUCTURE.

   dcl 1 bj_pst                       based aligned,
       2 version                      fixed bin,
       2 pad1                         bit (36),
       2 lock,
         3 pid                        bit (36),
         3 event                      bit (36),
       2 time_of_bootload             fixed bin (71),
       2 max_n_entries                fixed bin,
       2 n_entries_used               fixed bin,
       2 highest_ix_used              fixed bin,
       2 pn_table_offset              fixed bin (18) uns,
       2 check_in_table_offset        fixed bin (18) uns,
       2 buffer_table_offset          fixed bin (18) uns,
       2 max_n_buffers                fixed bin,
       2 pad2                         bit (36),
       2 meters,
         3 n_calls_begin_txn          fixed bin (71),
         3 n_calls_before_image       fixed bin (71),
         3 n_calls_abort              fixed bin (71),
         3 n_calls_commit             fixed bin (71),
         3 n_calls_rb_mark            fixed bin (71),
         3 n_calls_fm_pc_mark         fixed bin (71),
         3 n_calls_fm_rbh             fixed bin (71),
         3 n_calls_rollback           fixed bin (71),
         3 meter dim (9:50)           fixed bin (71),
       2 mod_list_area                (100) fixed bin (35),

       2 e                            dim (dm_system_data_$
                                      bj_max_n_journals refer
                                      (bj_pst.max_n_entries))
                                      like bj_pste;

    where:

    "version" is the version number of the structure.

    "lock"  is a  2-word lock in  the format assumed  by the fast
         lock  facility  of  the  lock manager.   The  first word
         contains  the  process id  of the  process that  has the
         lock.   The  before journal  manager  uses this  lock to
         perform open and close operations.  Any process doing an
         open or a  close of the before journal  must acquire the
         lock, and keeps it for the entire operation.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

    "time_of_bootload"  is the  time the Multics  system was last
         initialized.  It is  set at dm_per_system initialization
         using the value of system_control_$time_of_bootload.

    "max_n_entries" is the maximum number of before journals that
         can be "active"  at the same time.  This  item is set at
         dm_per_system_init   time   to   the   value   found  in
         dm_system_data_$bj_max_n_journals.

    "n_entries_used"  is  the  number  of entries  in  this table
         actually used.

    "highest_ix_used" is  the highest index in  this table of all
         used entries.

    "pn_table_offset"  is  the offset  of the  uid_pathname table
         maintained  by  the  before   journal  manager  for  all
         journals that  are active, i.e.,  that have an  entry in
         the   bj_pst  segment.    This  uid_pathname   table  is
         allocated in the same segment as the bj_pst.  It is used
         by  recovery  after  crash   to  determine  what  before
         journals must be recovered.

    "check_in_table_offset"     is    the     offset    of    the
         bj_check_in_table.  This table is  allocated in the same
         segment as the  bj_pst table; it is used  to record what
         process has what journal open.

    "buffer_table_offset" is  the offset to  the bj_buffer_table.
         This  table  is allocated  in  the same  segment  at the
         bj_pst table.  Each entry in this table is one page long
         and starts  at a page  boundary.  Entry i  is associated
         with the  journal described in bj_pst.e(i),  and is used
         as the buffer for that journal.

    "max_n_buffers" is no longer used.

    "meters"  are  various  meters maintained  by  before journal
         manager procedures.

    "mod_list_area" is not used currently.

    "e" is an array of entries  like bj_pste's.  Each entry, when
         used, describes an active before journal.  The number of
         entries    of    this     array    is    specified    by
         dm_system_data_$bj_max_n_journals.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

7.2.2 THE BJ_PSTE STRUCTURE.

   dcl 1 bj_pste                      based aligned,
       2 version                      fixed bin,
       2 bj_ix                        fixed bin,

       2 lock                         aligned,
         3 pid                        bit (36),
         3 event                      bit (36),

       2 bj_uid                       bit (36),
       2 ci_size                      fixed bin,
       2 max_size                     fixed bin,
       2 active                       bit (1) aligned,
       2 time_header_updated          fixed bin (71),
       2 earliest_meaningful_time     fixed bin (71),
       2 update_frequency             fixed bin,
       2 last_rec_id                  bit (36),

       2 n_processes                  fixed bin,
       2 n_txn                        fixed bin,

       2 last_ci_info                 aligned,
         3 last_ci_buffered           fixed bin (24) uns,
         3 last_ci_put                fixed bin (24) uns,
         3 last_ci_flushed            fixed bin (24) uns,
         3 last_ci_on_disk            fixed bin (24) uns,

         3 stamp_for_last_ci_put      fixed bin (71),
         3 stamp_for_last_ci_on_disk  fixed bin (71),

       2 n_bi_still_unsafe            fixed bin,
       2 n_bi_being_saved             fixed bin,

       2 buffer_offset                fixed bin (18) uns,
       2 pad1                         fixed bin,

       2 cl                           aligned,
         3 origin_ci                  fixed bin (24) uns,
         3 lowest_ci                  fixed bin (24) uns,
         3 highest_ci                 fixed bin (24) uns,
         3 number_ci                  fixed bin (24) uns,

       2 append_state                 aligned,
         3 current_operation          char (4),
         3 pending_n_txn              fixed bin,
         3 pending_last_rec_id        bit (36),
         3 pending_last_element_id    bit (36),
         3 txte_rec_id_relp           bit (18),


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       2 pad_to_even_word1            bit (36) aligned,

       2 meters                       aligned,
         3 n_bi_written               fixed bin (71),
         3 n_bi_bytes_written         fixed bin (71),
         3 n_journal_full             fixed bin (71),
         3 n_successful_recycles      fixed bin (71),
         3 n_ci_recycled              fixed bin (71),
         3 n_txn_started              fixed bin (71),
         3 n_non_null_txn             fixed bin (71),
         3 meter (8:10)               fixed bin (71),

       2 pad_to_64_words (6)          bit (36);

    where:

    "version" is the version number of the bj_pste structure.  By
         convention,  if  version =  0,  the entry  is  not used.
         While  an  entry  is  being  initialized  to  describe a
         journal being activated, the value  of version is 0.  It
         is set  to the non null  value only as the  last step of
         the journal activation.

    "bj_ix" is the index of this entry in the array of bj_pste's.
         It is used for consistency  checking and also to get the
         index  of  the  entry  when one  has  a  pointer  to the
         bj_pste.

    "lock"  is a  2-word lock in  the format assumed  by the fast
         lock facility of the lock manager.  It is used always as
         an  exclusive   lock.   All  storage   operations,  i.e.
         append, get and flush must acquire this lock.  A process
         can be working on only 1  bj_pste at a time, so there is
         no possibility of deadlock due  to locks on 2 bj_pste's.
         The bj_pst.lock is set by a process that does an open or
         close operation;  this does not  prevent another process
         from acquiring the lock on  a particular bj_pste for the
         purpose of doing an append,  get or flush operation on a
         journal already  active and not  involved in an  open or
         close operation.

    "bj_uid" is the unique id  of the before journal described by
         this entry.  It is copied from the before journal header
         kept in CI, when the journal is activated.

    "ci_size"  is  the control  interval  size for  this journal,
         expressed in number of bytes.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

    "max_size" is the maximum size of the journal file, expressed
         in the number of control intervals.

    "active" is a switch showing that the journal is active, i.e.
         that  it  is being  used  by at  least 1  process.  This
         switch is always ON in  the bj_pste.  The before journal
         header, bj_header in CI zero,  has the same structure as
         the bj_pste.   When this switch  is ON in  bj_header, it
         indicates that the journal  is currently active, or that
         it  was  active in  a  previous invocation  of  the data
         management  system,  while the  system crashed,  and may
         contain information about unfinished transactions.

    "time_header_updated" is the calendar clock time at which the
         before journal  header was last updated.   The header is
         updated  periodically  to help  finding  the end  of the
         journal after a system crash.

    "earliest_meaningful_time" is a calendar clock time also kept
         in  the header  to help finding  the end  of the journal
         after  a system  crash.  Any  CI whose  time modified is
         smaller than this "earliest_meaningful_time" contains no
         useful  information.  This  time is  set to  the current
         time when the journal is activated.

    "update_frequency"  is  a  number  indicating  how  often the
         header is  to be updated.  Updating  the header consists
         of  writing the  bj_pste information  into the bj_header
         structure in  CI zero.  An  update frequency of  N means
         that  the  header  should  be  updated  after  N control
         intervals have been written.

    "last_rec_id"  is  the record  identifer  of the  last record
         successfully appended to the journal.

    "n_processes"  is the  number of  processes that  have opened
         this journal.  The journal cannot be "deactivated" while
         n_processes is greater than zero.  The list of processes
         using the journal is  recorded in the bj_check_in_table.
         Each  process  that  has  the  journal  opened  has  its
         process_id  registered  in   the  check_in_table.   Dead
         processes  have   their  process_id  removed   from  the
         check_in_table   by   a    "garbage   collector"   which
         automatically adjusts bj_pste.n_processes  to the number
         of alive processes.

    "n_txn"  is the  number of unfinished  transactions that have
         stored at  least 1 record  in the journal.   The journal
         cannot  be deactivated  while this item  is greater than
         zero, even if  n_processes = 0.  The reason  is that all


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

         processes that were using the journal may have died, but
         some unfinished transactions are still to be rolled back
         by the Data Management Daemon, using this journal.

    "last_ci_buffered" is the CI  number currently in the buffer.
         When a CI is in the buffer, its info is not in the file.
         When the buffer is full, it is written into the file, by
         a "put"  request to the  file manager.  When  the put is
         completed, the  buffer is initialized as  the next CI of
         the journal, showing no element in it.

    "last_ci_put" is the CI number of the last CI copied from the
         buffer to the file.  It is updated only after the buffer
         is completely "put" in the  file.  For a short time, the
         last_ci_put is equal to the last_ci_buffered.  This is a
         legitimate situation, indicating that the buffer must be
         initialized with the next CI.

    "last_ci_flushed" is the CI number of the last CI for which a
         flush was requested.  The I/O may still be in progress.

    "last_ci_on_disk" is the CI number of the last CI on disk and
         such that all previous CI's are also on disk.

    "stamp_for_last_ci_put" is the time stamp associated with the
         last CI put.  It is needed by the flush function.

    "stamp_for_last_ci_on_disk" is the time stamp associated with
         the last  CI on disk.   This stamp is the  stamp that is
         stored  in the  ring zero  dm_journal_seg, in  the entry
         associated  with this  journal (See  section 8).   It is
         used by the flush function.

    "n_bi_still_unsafe"  represents the  number of  before images
         that  have not  yet been  secured to  disk.  This number
         indicates how many data base  CI's may be pinned in main
         memory.  (It actually is an upper bound.)  It is used by
         the "append"  procedure to decide if  the journal should
         be flushed because  too many data base CI's  are held in
         main memory, in order to allow them to go to disk.

    "n_bi_being_saved"  indicates  how  many  before  images  are
         actually being written to disk.  It is used by the flush
         function.

    "buffer_offset" is the offset of  the beginning of the buffer
         associated with this journal.  The size of the buffer is
         the same as the size of the CI for this journal.  In the
         current design, all buffers are  one page long and start


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

         at a page boundary.  They  are all allocated in the same
         segment as the bj_pst.

    "cl.origin_ci" is the CI number  of the CI which is currently
         the origin  of the cirular  list.  Each time  the end of
         the journal is reached, the origin is "advanced" as much
         as possible, over all CI's that no longer contain useful
         information.

    "cl.lowest_ci" is the lowest CI  number in the circular list.
         In  the  current design  it  is always  1, and  does not
         change when the origin is advanced.

    "cl.highest_ci"  is  the highest  CI  number in  the circular
         list.  In the  current design, it is aways  equal to the
         last CI number of the file, and does not change when the
         origin is advanced.

    "cl.number_ci" is  the number of  CI's in the  circular list.
         In the current design, it is always equal to the maximum
         size of the  file minus 1 (control interval  zero is not
         part of the circular list), and does not change when the
         origin is advanced.

    "append_state"   contains   information   about   the  append
         operation,  while  a  record  is being  appended  to the
         journal.   The information  is used  to make  the append
         operation "atomic".   Regardless of where  a process may
         stop (or die) while in  the before journal manager, if a
         record  has been  completely stored in  the journal, all
         before journal  manager control structures  (bj_pste and
         bj_txte)  are  automatically   updated  to  reflect  the
         existence of this record.

    "current_operation" is a character string which is null while
         no  record  is being  appended.  It  is equal  to "appe"
         while a record  is being appended.  It is  restored to a
         null  character  string after  the record  is completely
         stored  in  the  journal,  and  the  various  structures
         (bj_pste and bj_txte) have been updated accordingly.

    "pending_n_txn" is  the value that  bj_pste.n_txn should have
         after the record is  successfully appended.  This allows
         for  potentially   redoing  any  number   of  times  the
         adjustment of  an interrupted append  operation, even if
         the process crashes during the adjustment.

    "pending_last_rec_id"  is the  value that bj_pste.last_rec_id
         should have after the record is successfully appended.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

    "pending_last_element_id"  is the  element id  of the element
         about to  be stored in the  journal.  Storing an element
         is atomic:   storing the last element  of a record makes
         the  entire record  appear in  the journal  in an atomic
         manner.   The  protocol  is  as  follows:   All  storage
         operations  are  done  under  the bj_pste  lock.   If an
         append  operation  is  interrupted  in  the  middle, the
         process  executes  a  cleanup  handler  which  finds the
         pj_pste  lock set  to the process;  if it  finds that an
         append operation  was in progress,  it finds out  if the
         last  element  of  the  record was  stored.   If  it was
         stored,  the relevant  items in the  bj_pste are updated
         and  the record  id of the  new record is  stored in the
         bj_txte, causing  a more complete update  of the bj_txte
         items later.  Then the bj_pste  lock is relased.  If the
         cleanup handler  is not given  a chance to  be executed,
         the process  must execute a crawl  out procedure to exit
         the  data   management  inner  ring.    This  crawl  out
         procedure executes  what the cleanup  handler would have
         executed.   In  the  worst  case  where  the  crawl  out
         procedure is not given a chance to run, the process must
         soon  be terminated.   At a later  time, another process
         will try  to lock the  bj_pste and will find  that it is
         locked  by  a dead  process; then  it executes  what the
         cleanup handler would have executed.

    "txte_rec_id_relp" determines  the location, in  the bj_txte,
         which  must  be filled  with  the record  id of  the new
         appended record,  as explained above.  It  is a relative
         pointer in the segment containing the bj_txte.

    "meters" are various meters.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

7.2.3 THE BJ_PN_TABLE STRUCTURE

     The bj_pn_table is a table  maintained by the before journal
manager to  associate the pathname  of a journal  with its unique
id.  For each journal which has a bj_pste, that is, which is open
in at  least one process,  there is an entry  in the bj_pn_table,
with  the  pathname and  the  uid of  the  journal.  For  a given
journal, the index in the array of bj_pste's and the index in the
bj_pn_table array are the same.

     The bj_pn_table  is necessary to perform  the recovery after
crash.  It  is the starting  point, showing what  before journals
were open at  the time of the crash.   Therefore, this table must
be  modified   using  careful  coding,   so  that  it   is  never
inconsistent;  in  addition, the  table must  be flushed  to disk
after  every  modification.   It  is  the  only  table,  with the
uid-pathname  table  maintained  by  the file  manager,  which is
needed by recovery after crash.

     The bj_pn_table has the following structure:

   dcl 1 bj_pn_table             based  aligned,

       2 max_n_entries           fixed bin,
       2 bj_path_to_uid_relation dim
                                 (dm_system_data_$bj_max_n_journals
                                 refer (bj_pn_table.max_n_entries)),
         3 dir                   char (168),
         3 entry                 char (32),
         3 bj_uid                bit (36);

    where:

    "max_n_entries"  is  the  maximum  number of  entries  in the
         table.  It is needed by recovery after crash.  It is set
         at  dm_per_system initialization.   Its value  is copied
         from dm_system_data_$bj_max_n_journals, and  is equal to
         the maximum number of bj_pste entries.

    "bj_path_to_uid_relation" is an array with one entry for each
         journal.  Entry  i contains the uid  and pathname of the
         journal described by entry i of the array of bj_pste's.

    "dir" is the directory name of the journal.

    "entry" is the entry name of the journal.

    "bj_uid" is the unique id of the journal.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

7.2.4 THE BJ_CHECK_IN_TABLE STRUCTURE

     The  before journal  manager maintains a  table showing, for
each journal that has a bj_pste  entry, what are the processes in
which the journal  is open.  It is allocated  in the same segment
as the bj_pst.  It is not  needed for recovery after crash and it
does not have to be flushed to disk after every modification.  It
is initialized at dm_per_system  initialization, with the rest of
the bj_pst segment.

     The bj_check_in_table has the following structure:

   dcl 1 bj_check_in_table  based aligned,

       2 max_n_processes    fixed bin,
       2 max_n_journals     fixed bin,

       2 process_id         dim
                            (dm_system_data_$ bj_max_n_processes refer
                            (bj_check_in_table.max_n_processes))
                            bit (36),

       2 cross_proc_bj      dim
                            (dm_system_data_$bj_max_n_processes refer
                            (bj_check_in_table.max_n_processes),
                            dm_system_data_$bj_max_n_journals refer
                            (bj_check_in_table.max_n_journals))
                            bit (1) unaligned;

    where:

    "max_n_processes" is the maximum  number of processes allowed
         to do  data management at  the same time.   This item is
         initialized     with     the      value     found     in
         dm_data_$bj_max_n_processes.

    "max_n_journals"  is  the maximum  number of  before journals
         that can be described in the before journal tables, that
         is the maximum number of  bj_pste entries.  This item is
         initialized     with     the      value     found     in
         dm_data_$bj_max_n_journals.

    "process_id"  is  an array  of  process ids.   The  number of
         entries    in    this    array    is    determined    by
         dm_data_$bj_max_n_processes.  Each entry  is either zero
         or contains the value of a process id.  The first time a


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

         process uses the before  journal manager, its process_id
         is  entered  in  this  array.  The  index  of  the entry
         assigned to this process is used to identify the process
         in the cross_proc_bj table (See below).

    "cross_proc_bj"  is  a  2 dimentional  matrix.   Each element
         [p,j]  has  a one  bit  value; it  is  equal to  "1"b if
         process  with index  p in the  table of  process ids has
         open the journal with index j in the array of bj_pste's.
         One  might think  that a  count of  processes associated
         with each journal would have been sufficent.  In theory,
         this  is true;  but in practice,  it is  not possible to
         know, from a count, which  processes are still alive and
         which  processes  are  dead.   This is  the  reason why,
         instead of a count, the list  of process ids is kept for
         each journal.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

7.3 Per Transaction Information

     For each transaction in progress, the before journal manager
maintains  some information  relevant to the  transaction and the
Before Journal associated with it.  This information is kept in a
system shared  table which consists  of a header,  referred to as
the  bj_txt,  followed by  an  array of  entries, referred  to as
bj_txte's,  with one  entry for  each transaction.   The array of
bj_txte's  is  an  array  parallel  to  the  array  used  by  the
transaction  manager  to describe  transactions.  In  fact, entry
number  (i) in  the array  of bj_txte's  must be  regarded as the
extention of  entry number (i)  in the array of  tm_tdte's in the
transaction  definition  table used  by the  transaction manager.
The   bj_txt   is  created   and  initialized   at  dm_per_system
initialization.

     Modifications to the structure are made by carefully written
programs, in  such a way as  to appear atomic, even  if a process
fails in the middle of an update operation.

     The PL/1 declarartions for the  bj_txt and bj_txte are given
below, with a short description of each item in these structures.

7.3.1 THE BJ_TXT STRUCTURE.

   dcl 1 bj_txt                       based aligned,

       2 version                      fixed bin,
       2 max_n_entries                fixed bin,
       2 n_entries_used               fixed bin,
       2 pad_header_to_32_words       bit (36) dim (29),
       2 entry                        dim (dm_system_data_$
                                      max_n_transactions refer
                                      (bj_txt.max_n_entries))
                                      like bj_txte;

    where:

    "version" is a version number associated with this stucture.

    "max_n_entries"  is  the maximum  number  of entries  in this
         table.    It   is   initialized    to   the   value   of
         dm_system_data$max_n_transactions,  when  the  table  is
         created, at Data Management system initialization.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

    "n_entries_used" is the number of entries currently used.

    "entry" is an  array each element of which  has the structure
         of bj_txte described below.

7.3.2 THE BJ_TXTE STRUCTURE.

   dcl 1 bj_txte                             based aligned,

       2 tid                                 bit (36),
       2 bj_uid                              bit (36),
       2 entry_state aligned,
         3 last_completed_operation          char (4),
         3 ok_to_write                       bit (1),
       2 owner_info aligned,
         3 process_id                        bit (36),
       2 operator_info aligned,
         3 process_id                        bit (36),
         3 ppte_ptr                          ptr,
         3 bj_oid                            bit (36),
       2 records_info aligned,
         3 curr_checkpoint_rec_id            bit (36),
         3 first_bj_rec_id                   bit (36),
         3 last_bj_rec_id                    bit (36),
         3 n_rec_written                     fixed bin (35),
         3 n_bytes_written                   fixed bin (35),
         3 last_fm_postcommit_handler_rec_id bit (36),
       2 append_state aligned,
         3 current_operation                 char (4),
         3 pending_bj_rec_id                 bit (36),
         3 pending_n_rec_written             fixed bin (35),
         3 pending_n_bytes_written           fixed bin (35),
       2 pad_entry_to_32_words               bit (36) dim (13);

    where:

    "tid" is the transaction id  of the transaction that has been
         assigned this  entry.  It is  initialized at the  time a
         transaction  begins.  The  entry number  as well  as the
         transaction id  are assigned by  the transaction manager
         and  passed  as  input  argument to  the  before journal
         manager.


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

    "bj_uid"  is  the  uid of  the  before journal  used  by this
         transaction.

    "last_completed_operation" is  the last operation  the before
         journal manager  performed for this  transaction.  It is
         used for consistency checking.

    "ok_to_write" is a switch which is  set ot 1 at the beginning
         of the  transaction and turned off  after a committed or
         aborted mark has been written.

    "owner_info.process_id"  is  the  process id  of  the process
         which  started  the  transaction.   It  is  kept  in the
         bj_txte even though the process may be dead.

    "operator_info.process_id" is  the process id  of the process
         operating on this transaction.   In general this process
         id is the  same as the owner's process  id.  However, if
         the  process  that  started   the  transaction  dies  or
         abandons the transaction while  the transaction is still
         in  progress,   it  is  the   Data  Management  Daemon's
         responsibility   to   abort   that   transaction.   When
         performing  its duty  on the transaction,  the Daemon is
         said to be the "operator" of the transaction, and before
         being  able  to  do  any before  journal  operation, the
         bj_txte.operator_info items have  to be initialized with
         information relevant to the Daemon process.

    "operator_info.ppte_ptr"   is  a   pointer  to   the  bj_ppte
         structure  associated  with the  before journal  used by
         this  transaction,  in  the process  which  is currently
         operating the transaction.

    "record_info.current_checkpoint_rec_id"  is the  record id of
         the  current  checkpoint  record.    At  the  end  of  a
         checkpoint, a checkpoint record is written in the before
         journal.

    "records_info.first_bj_rec_id"  is  the   record  id  of  the
         earliest record written by this transaction and which is
         still  of  interest  to  the  transaction.   The  before
         journal manager uses the following optimization:  when a
         transaction begins,  the bj_txte is  initialized, but no
         "begin" record is actually  written in the journal.  The
         first_bj_rec_id is null.  If  the transaction commits or
         aborts  without  having  written   in  the  journal,  no
         commit/abort  mark is  actually written  in the journal.
         The first time a transaction writes a record (other than
         commit  or  abort),  first_bj_rec_id  is set  to  be the


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

         record id  of the record.  When  a transaction is rolled
         back, a "rolled_back" record  is written in the journal,
         and  its record  id becomes  the first_bj_rec_id, unless
         the  transaction  was rolled  back  to a  checkpoint, in
         which case first_bj_rec_id retains its value.

    "records_info.last_bj_rec_id"  is the  record id  of the last
         record written  in the journal by  this transaction.  If
         no record was written yet, this item is null.

    "records_info.n_rec_written" is the number of records written
         by  this transaction.   It is  used for  consistency and
         metering purposes.

    "records_info.n_bytes_written"  is the  number of  bytes this
         transaction  has written  in the before  journal.  It is
         used for consistency and metering purposes.

    "records_info.last_fm_postcommit_handler_rec_id"    is    the
         record id  of the last file  manager post commit handler
         record  written  by  this transaction.   For  economy of
         mechanism, the information needed at post commit time is
         stored  in  the  before  journal  as post_commit_handler
         records.   The  before   journal  manager  threads  them
         together and keeps the end of the thread in this item of
         the bj_txte.

    "append_state"  contains  some information  to  implement the
         atomicity of  the append operation, with  respect to the
         bj_txte items.  When an append operation is interrupted,
         either the record  is in the journal or  it is not.  The
         storage  manager part  of the before  journal manager is
         capable of  finding out; if  the record was  written, it
         updates    the    bj_pste   items    accordingly.    The
         bj_txte.items relevant  to the appended  record are also
         updated using the same method as for the bj_pste.

    "append_state.current_operation" is a  character string which
         is null when  no append is in progress.   It is non-null
         when an append is in progress.

    "append_state.pending_bj_rec_id"  is  the  record  id  of the
         record  which has  just been appended.   It appears here
         just after the storage manager has successfully appended
         the  last  element of  a new  record, indicating  to the
         bj_txte manager  that the record  is in the  journal and
         that the  bj_txte items relevant to  this record must be
         updated.  If the append  operation is interrupted in the
         little  window after  the record is  appended and before
         its record id  appears in this item of  the bj_txte, the


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

         mechanism described for  bj_pste.append_state causes the
         bj_txte to be adjusted.

    "append_state.pending_n_rec_written"   is   the   value  that
         bj_txte.records_info.n_rec_written"  should  have  after
         the  record  is  successfully  appended.   This  item is
         initialized before calling the storage manager to append
         the  record.   If  the append  operation  is interrupted
         after the  record appears in  the journal, this  item is
         copied  into  bj_txte.records_info.n_rec_written  by the
         bj_txte manager adjustment mechanism.

    "append_state.pending_n_bytes_written"  is   the  value  that
         bj_txte.records_info.n_bytes_written  should  have after
         the  record  is  successfully  appended.   Used  as  the
         previous item.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

8 TABLE USED BY THE HARDCORE

     The dm_journal table is a  ring 0 system table maintained by
the hardcore  support programs for data  management.  Its purpose
is  to  provide page  control  with some  of  the data  needed to
implement the "Write Ahead Log" (WAL) protocol.

It consists of three parts:

o A header with various constants and counters for metering.

o An  array  with one  entry  per before  journal; the  number of
  entries in this array is equal  to the maximum number of before
  journals that can be open at the same time in the system.

o An array with one entry for each  page that can be held in main
  memory to honor  the "WAL" protocol.  The number  of entries in
  this array  is equal to  the maximum number of  pages that page
  control  is  willing  to hold  at  the  same time.   It  is the
  responsibility  of the  before journal manager  to regulate the
  number  of  these temporarily  wired  pages by  flushing before
  journals as required.

The  dm_journal  table is  in  a hardcore  segment whose  name is
dm_journal_seg  and  whose  ring  brackets  are  0,  2,  2.   The
declaration  for  the  entire  table   is  given  below,  with  a
description of only  those items that are relevant  to the before
journal manager.

   dcl 1 dm_journal           aligned based,

       2 lock                 bit (36) aligned,
       2 wait_event           bit (36) aligned,
       2 notify_sw            bit (1) aligned,
       2 n_journals           fixed bin,
       2 n_journals_inuse     fixed bin,
       2 max_held_pages_mem   fixed bin,
       2 n_held_pages_mem     fixed bin,
       2 max_held_per_journal fixed bin,
       2 per_aste_pool (0:3)  aligned,
         3 threshold          fixed bin,
         3 n_active           fixed bin,
       2 free_list_relp       bit (18) aligned,
       2 synch_write_calls    fixed bin (35),
       2 synch_write_holds    fixed bin (35),
       2 synch_write_invalid  fixed bin (35),


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

       2 synch_write_tosses   fixed bin (35),
       2 unlink_calls         fixed bin (35),
       2 unlink_steps         fixed bin (35),
       2 activate_calls       fixed bin (35),
       2 deactivate_calls     fixed bin (35),
       2 activate_denied      fixed bin (35),
       2 set_stamp_calls      fixed bin (35),
       2 allocate_calls       fixed bin (35),
       2 free_calls           fixed bin (35),

       2 per_journal          (n_dm_journals refer
                              (dm_journal.n_journals))
                              aligned
                              like dm_per_journal,

       2 page_entry           (max_dm_pages refer
                              (dm_journal.max_held_pages_mem))
                              aligned
                              like dm_page_entry;

   dcl 1 dm_per_journal       aligned based,
       2 time_stamp           fixed bin (71),
       2 n_held               fixed bin,
       2 uid                  bit (36) aligned,
       2 access_class         bit (72) aligned,
       2 entry_relp           bit (18) aligned,
       2 pad                  bit (36) aligned;

   dcl 1 dm_page_entry        aligned based,
       2 fp                   bit (18) unal,
       2 bp                   bit (18) unal,
       2 cme_relp             bit (18) unal,
       2 journal_relp         bit (18) unal;

    where:

    "dm_journal.n_journals"  is  the  maximum  number  of  before
         journals  that may  be "active",  i.e.  that  may have a
         bj_pste entry, at the same time in the system.

    "dm_journal.n_journal_inuse" is  the actual number  of before
         journals currently active in the system.


Multics Technical Bulletin                                MTB-560
DM: Before Journal Design

    "dm_journal.max_held_pages_mem" is the maximum number of held
         pages that  page control is  willing to honor.   It is a
         constant defined at bootload time.

    "dm_journal.n_held_pages_mem"   is   the   number   of  pages
         currently  in  main  memory   that  page  control  would
         normally have written out but actually did not, in order
         to honor the "WAL" protocol.

    "dm_journal.max_held_per_journal"   is  the   result  of  the
         division     of     dm_journal.max_held_pages_mem     by
         dm_journal.n_journals_inuse.   The   hardcore  maintains
         this  item  and  updates  it  each  time  the  number of
         journals in use changes.

         The before journal manager  can read the dm_journal_seg_
         from ring  2.  It uses  this item to make  sure that the
         total number  of held pages never  exceeds the threshold
         allowed by  page control.  For each  journal, it keeps a
         count  of  the number  of  pages currently  held  by the
         journal;   when   this   count   becomes   greater  than
         max_held_per_journal, it flushes the journal.

    "dm_journal.per_journal"  is  an  array  with  one  entry per
         journal.  By convention, for  a given journal, the index
         in this  array is the  same as the index  of the bj_pste
         entry.   Each  entry  has  a  structure  defined  by the
         "dm_per_journal" declaration.

    "dm_per_journal.time_stamp" is a  clock value indicating that
         all before  images produced in  this journal up  to this
         time are known to be  on disk; any before image produced
         after  this  time is  not  on disk  yet.  Each  time the
         before journal  manager flushes a journal,  it knows the
         value of this time stamp;  it calls upon the hardcore to
         enter the time stamp in the entry with the same index as
         the  bj_pste  of  the  journal.  When  a  file's control
         interval is modified, a before image is first taken, say
         at time T.  This time T and the index of the journal are
         then  stored in  the header  of the  CI to  be modified.
         Before  writing out  a page,  page control  compares the
         time  stored  in  the  CI and  the  time  stored  in the
         corresponding dm_per_journal entry.  If  the time in the
         CI is  greater, it means  that the journal  has not been
         flushed  far enough  yet, and  the page  is to  be held.
         Otherwise, the page can be written out.

    "dm_per_journal.n_held" is  the number of  pages currently in
         main  memory  that  page  control  would  normally  have


MTB-560                                Multics Technical Bulletin
                                        DM: Before Journal Design

         written  out but  actually did  not because  of the time
         stamp in this entry.

    "dm_per_journal.uid" is  the unique id of  the before journal
         described by this entry.  By  convention, if it is zero,
         this  entry is  free.  When  the before  journal manager
         "activates" a new journal, it  calls ring zero, with the
         uid  of the  journal, to  allocate an  entry.  Ring zero
         finds  a free  entry in  the dm_journal,  initializes it
         with the  uid of the  journal, and returns  the index to
         the before  journal manager, which uses  it as a bj_pste
         index.

    "dm_per_journal.access_class" is the AIM  access class of the
         process  which  activated the  journal.  In  the current
         implementation, there  is only one  dm_journal table for
         the entire Multics system.  It  is not possible to use 2
         data bases of different AIM classes in the same process.
         It is  possible to have data  bases of different classes
         in  the system;  however, this requires  having one data
         management  system  for  each access  class.   Each data
         management system  would have its own  set of bj_pst and
         bj_txt tables.  Since the dm_journal table is unique for
         all AIM classes, it is necessary  for ring 2 to ask ring
         0  to allocate  an entry and  use the  index assigned by
         ring 0.

    "dm_per_journal.entry_relp" is a relative pointer to the list
         of pages held for this journal.

    "page_entry" is  an array with  one entry for  each page that
         can be held.   It is not relevant to  the before journal
         manager design.