MULTICS TECHNICAL BULLETIN                                MTB-625

To:       MTB Distribution

From:     Rich Coppola

Date:     05/27/83

Subject:  The SSF and Multics

                           - ABSTRACT -

The contents of this MTB outline the capabilities of the DPS88
System Support Facility (SSF) and the perceived security,
integrity and reliability issues that arise using the DPS88
hardware platform.  It is presumed that the reader has some
knowledge of the DPS88 hardware architecture.

          Comments should be sent to the author:

          via Multics Mail:
             To:  Coppola.SysMaint On System M.

          via Mail:

             R.  L.  Coppola
             Honeywell Information Systems
             Large Information Systems Division
             P.O.  Box 8000 Mail Station Z-30
             Phoenix, AZ 85066

________________________________________

Multics  Project  internal  working  documentation.   Not  to  be
reproduced or distributed outside the Multics Project.


                                                          MTB-625

INTRODUCTION

The intent of this MTB is to address the perceived security,
integrity and reliability issues raised by the System Support
Facility (SSF) when Multics uses the DPS88 hardware platform.
This perception of the SSF, its capabilities and design, is based
on the SSF EPS and Technical Design Memo's (TDM's).  It does not
address the many fine features of the SSF as they are considered
non-issues.  The intent of this paper is to state the
capabilities of the SSF and possible negative impact on Multics
security and integrity.  The impact may be due to a failure in
the communications protocol between the SSF and Multics or a
successful breach of SSF, and therefore Multics, security by a
hostile or benign maintenance user.  In no way is this paper
meant to impugn the intentions or capabilities of the SSF
developers.

The reader should be cognizant of the basic premise of a secure
computing environment.  Simply stated, everything outside of the
secure Kernel cannot be trusted.  Studies have proven that
software alone cannot provide adequate levels of protection in a
secure environment.  The software must be supported by a proven
hardware design.  Without proper hardware support it is
impossible for any software development house, including Multics,
to provide a secure computing environment.

OVERVIEW

The System Support Facility (SSF) is a free-standing computer
system that is used to perform all hardware diagnosis on the
DPS88 hardware complex.  The EPS claims that the SSF is a support
facility for the entire DPS88 system.  It, however, is really a
hardware support facility.  It provides minimal facilities for
the support of the other critical elements of the system (e.g.,
software and network).

Architecturally the SSF consists of a standard Honeywell Level-6
with one special adapter board for interfacing to the DPS88's
Intercomputer Controller (ICC).  The software structure consists
of a modified MOD-400 base which hosts a special executive called
System Maintainability and Availability Software (SMAS).  All
maintenance functions, both non-functional (off-line/dead-system)
and functional (on-line/operational Multics) are performed
through the auspices of SMAS.  The SSF, therefore requires, and
the DPS88 has been designed to provide, direct access to all
resources in the system, including memory and peripheral devices.
In essence the SSF is more privileged than the RING-0 Kernel.

The SSF initializes the DPS88 hardware, and initiates the 'boot'
process for the operating system.  Once Multics is operational a
multi-layer HDSA Common Exchange Interface (CXI) dialogue is


MTB-625

established between the OS and the SSF using standard HDSA
session control over a direct channel (DI) on the IOX.  This
interface is used by the SSF to request host system resources for
T&D requests, inform the OS that a central system component has
failed, pass central system error records from the SSF's files to
the syserr_log, etc.  A 'deadman' hyper-connect protocol is used
to keep Multics and the SSF informed of the operational status of
each system.  These dialogues are the only means the SSF has to
determine the operational status of Multics.

All maintenance software (e.g., T&D, NFTs) is resident on the SSF
system.  All requests to run tests on a mainframe or peripheral
device are routed to SMAS whether they originate from a terminal
logged in to the SSF or Multics.  In addition to alarm processing
there are two different aspects of testing from the SSF.
Peripheral testing and mainframe testing.

PERIPHERAL TESTING
For peripheral, memory or FNP testing SMAS negotiates with
Multics to acquire the target resource, and 80KW of memory.  Once
Multics has deconfigured the target resource and granted the T&D
request the SSF loads the Functional Test System (FTS), which
runs in the BASIC (GCOS3) decor, into the memory and initiates
shared-processor mode or hyper-switching on the DPS88 CPU.  FTS
may have to be modified to run in the Multics decor as Basic
decor functionality may have to be removed to support Multics.
Although it is possible to dedicate a CPU to FTS in a multi-cpu
configuration, it is doubtful that any of our customers would
accept the dedication of such a valuable resource for peripheral
testing.  FTS issues all of its own I/O and processes all
resultant interrupts.  There is no mechanism available to allow
Multics to validate or issue the I/O on behalf of FTS.

Memory testing is also performed by FTS under the MOLTS
subsystem.  The memory frame to be tested is acquired from
Multics using standard protocol.  Once SMAS has been told by
Multics to proceed with testing, the hyper-page table(s) for the
memory frame(s) are assigned to FTS and testing proceeds in the
same manner as does peripheral testing.

Front-End Processor testing follows the same protocol sequence
described above.  FTS is assigned one of the logical channels on
the DI interface to the FNP, in much the same manner as it would
be if it were testing a peripheral on an MPC.

MAINFRAME TESTING
If the target of a test request is a mainframe component (CPU,
IOX or CIU; not a peripheral, FNP or block of memory) SMAS
negotiates with Multics to acquire the target resource.  Since
all test programs and media reside in the SSF it does not require
additional DPS88 resources to support the test activity.  Once
Multics grants the test request and deletes the resource, the SSF


                                                          MTB-625

may begin testing.  Any isolation of the target resource must be
performed by the SSF since Multics does not have access to the
hardware 'switches' to perform the isolation itself.

In all cases once testing is initiated, Multics never receives
control until SMAS informs it that it may or may not have the
resource back.

ALARM PROCESSING
Alarm processing is the result of a central system component
sending a signal to the SSF signifying that the component has
detected an error requiring direct intervention by the SSF.  The
contract of the SSF is to attempt to help the hardware recover
from the failure to keep Multics operational.  There are
essentially two types of alarms.  The first is the result of
certain internal errors detected in a unit that allow it to come
to an orderly halt.  These may be retryable.  The second type,
which are primarily failures in the memory hierarchy logic, cause
all clocks to be stopped immediately.  The majority of failures
in this area are non-recoverable and will usually result in a
system crash.
For both types of alarms the SSF determines what action is
required (e.g., run NFTs, retry the operation) and informs
Multics of the result by signalling a fault.  The SSF also places
the Machine Conditions in reserved memory so that Multics may
retry the operation on another CPU, if retry on the CPU that
originally signalled the alarm does not succeed.

If the SSF 'thinks' that Multics is inoperable due to a failure
in the hyper-connect protocol or possibly in the CXI dialogue, it
will send a message to the SSF console informing the operator
that it thinks the host is down and to initiate re-boot
activities.  Although there are plans to support NFTs and dynamic
failure analysis/recovery for the CIU and IOX, only the CPU will
have alarm recovery support at FCS.  When the entire central
system will be fully supported is unknown at this time.

Since the IOX does not have a console channel, the system console
is a part of the SSF hardware complement (the SSF console may
also be the Initializers' console).  The console, a standard
VIP7802, is connected to the L6 hardware BUS by a standard L6
MLCP.  Multics can communicate with the console using standard
HDSA session control or the ICC console emulator.  If the
customer only purchases one 'console' the SSF must multiplex the
operator and maintenance functions/users on that one console.

To summarize, the relationship between the SSF and the rest of
the DPS88 system may be conceptualized as closely-coupled with
respect to the hardware complex and loosely-coupled with respect
to operating system software (in this case Multics).  Although
the SSF and Multics are loosely coupled, the SSF always maintains
absolute control of the hardware complex.


MTB-625

To understand the problems presented to Multics by this design a
brief overview of how a maintenance process is controlled in the
Multics environment is necessary.  The standard I/O interfacer
(ioi_) and isolts_ are used as examples.


                                                          MTB-625

CURRENT_ISOLATION_MECHANISMS:

isolts_

The ISOLATED Online Test Subsystem (ISOLTS) provides a means of
testing Multics processors online, in an ISOLATED environment
using the TOLTS executive program.  The target CPU must have been
released, by Multics, from the service system prior to the test
request.  The isolation portion of the isolts_ mechanism, which
resides in the inner-ring, then ensures that the CPU is ISOLATED
from the system by performing the following tasks:

ISOLATION_STEP_1
The operator is told to reconfigure the target CPU such that it
has access to one and only one SCU, the base 64KW of memory on
that SCU being dedicated to the isolts_ process, for an extent of
64KW.  The Multics Development Center would prefer that the
entire reconfiguration process be performed by the software but
the current CPUs do not provide software control of the
configuration panel switches.  However, the isolation process has
been designed to intercept manual reconfiguration errors and
inform the operator of the error before any testing is initiated.

ISOLATION_STEP_2
All memory in the dedicated SCU is then removed from the system
so that any reconfiguration test failures do not result in the
corruption of data or a catastrophic system failure.

ISOLATION_STEP_3
The inner-ring isolation/reconfiguration logic then ensures that
the target CPU is configured properly, and is indeed bound by the
configuration switch settings.

ISOLATION_STEP_4
If all of these checks are successful, the rest of the memory on
the SCU is returned to the system and isolts_ is allowed to
proceed and the standard 'off-line CPU T&Ds are executed on the
target CPU.

In order to run isolts_ the maintenance process must also have
access to several privileged hardcore gates.


MTB-625

The I/O Interface

A Multics ring-4 process cannot perform I/O directly, as the CIOC
instruction is classed as a privileged instruction.  Therefore
the maintenance process, whether it wants to use TOLTS (T&D) or a
Multics tool such as load_mpc, must accomplish all I/O operations
by interfacing with the standard I/O interfacer (ioi_).  Unlike
GCOS, Multics does not allow the Initializer's (system) console
to be used to run T&Ds.

The protocol followed by all Multics maintenance tools, including
T&D, is as follows:

   The target device is attached using the Resource Control
   Package (RCP) rcp_ or rcp_$priv_attach subroutines (the
   process must have access to the rcp_sys_ gate to use the
   $priv_attach entry).  RCP determines whether or not the
   requesting process may attach the resource by checking its
   access on the access control segment for that resource.
   Assuming that the process has the proper access the operator
   is asked to mount the requested media, if appropriate.  If the
   target is assigned to the system or another process, RCP will
   deny the request.

   The channel program is set up (IDCW/DCW list) and a call to
   ioi_$connect is made to have ioi_ perform the I/O operation.
   If the I/O operation is directed towards an MPC and may affect
   its operation (e.g., running ITR's), a call to
   ioi_$suspend_devices is required to allow all current I/O to
   be drained, and to prevent further I/O until ioi_ is notified
   that all other I/O may be resumed (this is accomplished via a
   call to ioi_$release_devices).  The ioi_ interfacer validates
   the first IDCW of the channel program and, if valid, performs
   the I/O on behalf of the maintenance process.

   Since the hardware will not allow a channel program to change
   the target of the I/O operation, this validation is only
   performed on the first IDCW of the channel program.

   The Multics T&D interface, tolts_, follows the protocol
   described above.  The only difference is that it translates
   specific MMEs in the T&D test-pages into calls to the
   appropriate ioi_ entrypoints.

   In all cases the maintenance process is able to attach a
   specific system resource only if the system operator and
   system administrator allow it.  The operator controls access
   to a resource by authenticating or denying the request from
   the maintenance process, the system administrator by providing
   access to the required gates.  Additionally, the maintenance
   process is bound by the fact that it resides in ring-4 and is


                                                          MTB-625

   completely controlled by the Multics kernel and the hardware
   ring mechanism.

   In addition to all of the above, the sharing of the target
   resource between Multics and the maintenance process is
   prohibited.  It is not possible to run T&D on a resource that
   is currently 'owned' by Multics or some other process.  System
   storage devices (disk) being a good example.  The reasoning
   behind this restriction is twofold:  In the first place it
   presents a security breach; Secondly it is not logical to
   trust a device which is believed to be broken to follow
   protocol and only write on the T&D cylinder, or not write
   anywhere at all.

   These examples should make it obvious that a maintenance
   process, just like any other Multics process, is governed by
   an established set of hardware enforced accesses and gates
   which may be dynamically controlled by the systems
   administrator depending on the immediate needs of the system.
   The critical areas of resource ISOLATION, channel program
   validation etc.  are jointly controlled by the hardware and
   the inner ring (most trusted) software.  This design ensures
   the security and integrity of customer and system data while
   allowing concurrent maintenance on ISOLATED system resources.
   This ability, validation of channel programs, hardware
   enforcement of access and rings and the ability to ISOLATE the
   unit under test cannot be emphasized enough.


MTB-625

The_SSF:

The SSF provides all facilities that are necessary to perform
maintenance on DPS88 hardware complex and the SSF itself.  Some
of these facilities include; the ability to dump SSF memory,
update SMAS, read/write memory and cache as well as hardware
registers, patch SMAS bound units and MPC firmware, write to the
CIU calendar clock, disable/clear the SMAS journal.

The SSF is also perfectly capable of performing I/O to any device
on the IOX, including the servicing of any resultant interrupts,
or writing to the memory hierarchy without the consultation or
concurrence of Multics.  It is not suggested that SMAS will
intentionally circumvent established protocol.  However, it is
possible that a user of SMAS may do so intentionally or, most
likely, SMAS may not 'think' that Multics is operational due to a
failure in the inter-system interface and allow direct memory
hierarchy or peripheral device operations.

Access to the SSF is controlled by the SMAS log-in facility.
With the exception of the system console, Multics has no
knowledge of any SSF log-on activity.  SSF facilities (e.g.,
maintenance activities, operator functions, SMAS
updates/patching) are controlled by SMAS with a set of access
control lists that appear to be very similar to those used in
Multics.  The important difference is that there is no hardware
implementation in the L6 to enforce the access control list.  All
access control is enforced by SMAS, which is a newly written,
untested, assembly language operating system.  This limited
access control is diminished even further by the fact that the
SSF and its users share the same, primitive, single layer file
system (there is no hierarchical structure).  The SSF provides
maintenance and operational facilities to users, concurrently,
with different levels of access and privileges (multi-level
access control processing/multiplexing).  Since the SSF has a
minimal access control mechanism and file system, it will be
extremely difficult, perhaps impossible, for it to adequately
control and prevent accidental or intentional modification and/or
destruction of the file system or its own object code.

Maintenance on the SSF itself is accomplished utilizing the
standard CSD maintenance strategies for the L6.  A connection is
made to the L6 System Control Facility (SCF) providing absolute
control of the L6 to the maintainer who may or may not be
connected over a dial-up line.  The SCF connects the dial-up line
to the maintenance console.  It makes no differentiation between
the local console or dial-up line.  The only means of discerning
where input came from is by perusal of the hard-copy ROSY output
for a leading space.  Since the ROSY printer is an option, this
hard-copy audit may not be available.  When the SCF is enabled
for L6 maintenance the ICC connection to the DPS88 mainframe


                                                          MTB-625

complex is disabled.  This is to prevent any disruptions to the
DPS88 system that may result due to maintenance activities on the
L6 itself.  If an alarm is signalled by the DPS88 mainframe
complex during this period it cannot be processed until the SSF
is returned to service.  A reload of MOD 400 and the SMAS
software is forced when the ICC interface is re-enabled after
maintenance activity on the L6.  However, there are no
requirements or forced hardware functions which would cause the
SSF disk(s) to be dismounted when the SCF is enabled.  The
assumption, based on SSF access control, is that the
disk-resident software will not/cannot be modified when the SCF
is enabled.

ERROR_RECOVERY

When a central system hardware component detects an error it
signals an alarm to the SSF.  Once the alarm is signalled, the
unit is usually inoperable until the SSF services the alarm and
returns the unit to the host if the error was recoverable.  The
SSF interface to the DPS88 system is a bit serial shift path
controlled by the L6.  Since all maintenance functions are
performed over this interface, it is not inconceivable that most
error retry/recovery attempts will take up to 20 seconds.  The
EPS states that successful retries are invisible to the process
in control of the CPU at the time of the error but does not
specifically state that Multics will be informed.  Multics must
be informed of this activity so that it can adjust accounting and
any other pertinent data.  DPS88 HW systems engineering states
that Multics will be informed and the SSF will place recovery
information in the Multics address space.

According to DPS88 HW systems engineering, only specific internal
CPU parity errors will be retried.  All CIU and memory hierarchy
failures are currently treated as fatal and will result in a
system crash.  IOX failures are, from all available data, logged
only.  The SSF EPS states otherwise.

INSTRUCTION_RETRY

The EPS is not at all clear on this subject.  It does not address
which instructions are retryable, how retry is performed, how the
memory hierarchy or CPU registers are reverted to their previous
contents before the instruction is retried or how/when Multics is
informed of the recovery attempt.  The potential for corrupting
shared data bases, especially locked system data bases, is
significant and requires a very detailed description and review
of this operation for the Multics environment.


MTB-625

THE_HYPERVISOR

The hypervisor consists of software that runs in the DPS88
hardware at the most privileged level possible.  SMAS may,
through the auspices of the hypervisor, dispatch from one OS to
another (Multics to FTS) without any negotiation whatsoever.  The
SSF may instantly reassign memory that is assigned to the second
OS by reloading the hyper-page table.  Memory may, in fact,
overlap or be taken from one OS and be made available to another
by manipulating the hyper-page table.

This is not meant to imply that the hypervisor software or SMAS
will intentionally disregard protocol, or intentionally make
Multics memory available to FTS.  As stated earlier the
inter-system communications interface may fail, causing SMAS to
make a wrong decision.  The intent is to make the reader aware of
the capabilities of the SSF.

It does not appear that the problems of hyper-switching and
"multi-OS" or "multi-computer" have been adequately resolved.
The SSF EPS does not address this subject adequately for a
technical evaluation.  Resource sharing (e.g., CPUs, channels,
memory, etc) and attendant accounting problems, time 'skips' and
event notification are not addressed adequately at all.


                                                          MTB-625

SUMMARY:

To understand the Multics Development Centers' concerns with the
SSF it is necessary that those unfamiliar with Multics have an
understanding of the security it offers and what is expected by
the majority of the Multics PARC.  That reader should also be
aware that Multics has been certified by the Department of
Defense for multi-level processing.  It is the only commercial
system to achieve that rating.  The Air Force, although it does
not assign ratings, has declared that Multics is the most secure
commercial system available.  The Multics Development Center does
not want to jeopardize or lose that status, a very important
selling point to security conscious commercial sites as well as
the federal marketplace.  The inability to control the facilities
provided by the SSF with certified hardware and software presents
definite security and integrity problems for the Multics PARC,
especially the FSD and National Defense Agency segments.  The SSF
abrogates the architectural design of Multics and is, without a
doubt, the least secure element in the DPS88 system.  Since the
security of a system is measured by the least secure element, HW
or SW, in that system, Multics will most likely acquire the
classification of a Level 6.

We agree with the DPS88 developers that once access to the
physical machine room is granted to someone, that person must be
considered to be 'secure' by the site.  However, there are levels
of security within the machine room.  The operator has specific
duties and restrictions.  The operator is normally logged into
the system as a standard user, and is governed by the same access
controls as all other users are.  Although the operator does have
unlimited access to the Initializer's console, the functions
provided are restricted to what is necessary to operate the
system.  The operators access to highly privileged system
functions is restricted by the fact that he/she must know the
password to enter 'admin mode'.

A CSR in the machine room has the same, if not more, restrictions
as the operator.  The CSRs process is governed by standard access
controls, usually including a process overseer, established by
the site as Multics does not allow T&Ds to be run from the
Initializer's (system) console.

Although anyone in a current product line machine room may push
the wrong button and crash the system it is extremely difficult,
if not impossible, for them to breach security or compromise
sensitive data by manipulating maintenance panels without being
noticed by site personnel or the system itself.  The current
product line also has the capability to lock-out maintenance
panel functions with the test/normal switch.  It is also possible


MTB-625

to ISOLATE hardware resources to allow concurrent maintenance
without effecting the security and integrity of the system.

The SSF provides a greater potential for a breach of security
than the current product line because it has capacity to access
any hardware resource in the hardware complex without the
knowledge of Multics.

Our primary concerns with the SSF are with its unrestricted
access to the entire hardware complex and the reliability and
integrity of the SSF hardware and software.  MDC is also
concerned with the ability of the SSF to multiplex operator and
maintenance functions properly.  The SSF and SMAS software must
be capable of performing multi-level access control to keep the
two functions/users isolated from each other.  The most current
TDMs and the SSF EPS indicate that there are problems in this
area, especially when one terminal is used as an operator and
maintenance console concurrently.

Another major concern is that the functions provided by the SSF
far exceed those required of a maintenance or support facility,
and are in fact much more complicated than necessary.  These
concerns are caused by the perceived lack of a total systems
integration of the SSF, DPS88 hardware and Multics.  (Multics).

To provide adequate control of the SSF and its maintenance
processes, Multics must be able to validate all channel programs
and I/O accesses and be able to ISOLATE or verify the ISOLATION
of any and all resources being tested by the SSF.  Multics should
also have a means of ISOLATING all of its hardware resources from
the SSF.  The SSF should be a passive facility, performing
activities dictated by the operating system or alarm processing.


                                                          MTB-625

RECOMMENDATIONS:

The following recommendations address the major areas of concern.
If accepted, the gaps in security and integrity will be minimized
to the point that the Multics Development Center would no longer
consider the SSF as a crucial issue in developing software for
the DPS88 hardware platform.

1. Eliminate peripheral and FNP testing from FTS while the host
   is operational.  This function belongs in the domain of the
   host.

2. Develop a mechanism, in the hardware, that will allow the OS
   to lock the SSFs access to the hardware complex.  This
   hardware lock, would be reset by the hardware when an alarm is
   signalled to the SSF.  It could be re-locked, by the OS, when
   the SSF notifies it that the SSF is returning the resource.
   To perform the necessary isolation Multics must be able to
   lock access to the hardware complex on a component basis.

3. The SSF cannot directly write into or read from the Multics
   address space while Multics is operational.  All retry
   information and any other data must be placed into the memory
   that is used for the CXI interface.

4. Reduce the HDSA CXI session control interface to a single
   layer low-level interface.  There are several reasons for
   this.  The most important is the reduction of complexity.
   This extremely important inter-system connection must be as
   simple as possible to reduce the likelihood of failure.

5. Develop a mechanism, in the hardware, that will force a
   write-protect or a cycle-down of the SSF disk when the SCF is
   enabled.  This will force physical operator intervention to
   allow all modifications to the SSF disk.

6. Until the L6 firmware for the SCF is changed to differentiate
   between local and remote users the SCF must be disabled.  The
   SCF should be enabled only when maintenance on the L6 is being
   performed.  This could be accomplished by using the
   maintenance enable switch.

7. While Multics is operational direct log ins to the SSF cannot
   be allowed.  All maintenance activity should be routed through
   Multics when it is operational.  The maintainer would log into
   Multics and be serviced by a process overseer, much like he is
   today.  This mechanism would allow Multics to validate the
   users access to commands and functions before passing them to


MTB-625

   the SSF for execution.  If Multics is down the maintainer
   would be able to log into the SSF directly.

8. To allow secure remote maintenance capabilities the CSD TAC
   should install a Multics support system.  This would address
   several security problems raised by TDM-RAS-125 and
   TDM-RAS-132.  The Multics support system would obviate the
   need for TAC personnel to know customer phone numbers or
   passwords.  It would also be able to enforce the various
   levels of access for each skill level.

9. The 'deadman' protocol used to determine the health of the SSF
   and Multics must be defined so that it is as fail-safe as
   possible.

10.  All data transmissions to or from the SSF should utilize
   standard communications methods to validate that the data was
   transmitted correctly.  This is important not only for
   file-to-file transfers, but normal interactive transmissions.