Multics MTB-673


Multics Technical Bulletin                                MTB-673
Disk DIM

To:       Distribution

From:     Tom Oke

Date:     08/16/84

Subject:  Adaptive Disk Optimization

Comments should be sent to the author:

via Multics Mail:
   Oke.Multics on System M or CISL.

via Mail:

   Tom Oke
   Advanced Computing Technology Centre
   Foothills Professional Building
   #301, 1620, 29th Street, N.W.
   Calgary, Alberta, Canada
   T2N 4L7
   (403) 270-5414

1 ABSTRACT

     This MTB outlines modifications to the existing disk control
software to produce an optimization of system performance.  These
modifications produce  a disk driver (disk  DIM) which adaptively
optimizes  disk   operations  to  attain   best  possible  system
responsiveness and system throughput, as opposed to best possible
disk throughput.

     The resulting disk DIM  permits better resource utilization,
is site tunable and provides additional metering support over the
existing software.

________________________________________

Multics  project  internal  working  documentation.   Not  to  be
reproduced or distributed outside the Multics project.


Multics Technical Bulletin                                MTB-673
Disk DIM

2 INTRODUCTION

     The  modifications  described  in  this  MTB  are  aimed  at
producing a  MULTICS disk driver  (DISK DIM) which  adapts to the
changing   paging  needs   of  the  system   to  optimize  system
responsiveness and performance.

     Traditional  disk  optimization schemes  have been  aimed at
optimizing the throughput of the disk system itself, and have not
accounted for the management characteristics of secondary storage
inherent in  a Virtual Memory  system, particularly one  in which
all  storage   access  is  essentially  done   through  a  paging
mechanism, rather than distinct file IO.

     Such  approaches have  meant that  system responsiveness and
system performance were adversely  affected by the delays imposed
to  service  disk  IO.   In a  traditional  IO  system,  in which
processes  explicitly  wait  (block) until  IO  completion, these
delays may be necessary.  In a system which only blocks processes
reading  information  from secondary  storage, and  which buffers
data destined for secondary storage, such service characteristics
are detremental to system operation by increasing average process
blocking delays and reducing the level of multi-programming.

     MULTICS has attempted  to solve some of these  problems by a
prioritization and  buffering of IO, such  that blocking types of
IO's, page reads and VTOC  operations, are high priority and will
be  done  before non-blocking  operations,  such as  page writes.
This  noticably increases  system throughput  and responsiveness.
It is  still too static  an optimization technique  to serve well
through a wide range of system loading.

     The new  DIM described in  this MTB is an  attempt to answer
the  dynamic  requirements of  optimization of  a system  such as
MULTICS.  It  provides acceptable system performance  over a wide
range of load levels, and permits  site tuning of the disk system
to  individual  site  needs.   Tuning is  done  at  the  level of
individual IO  types, for each  drive, providing a  great deal of
flexibility.   Ballpark  system  defaults  attempt  to  provide a
system which does  not require a great deal  of tuning to provide
acceptable results.  Additional meter  information is provided to
provide a site with as  much information as possible to determine
hardware and tuning needs and effectiveness.

     The  resultant   DIM,  provides  a  wide   range  of  tuning
parameters, and responds well to good ballpark tuning values.  It
has  a  fairly  broad  tuning   range  and  does  not  appear  to
demonstrate tuning instabilities.  Thus it should be difficult to


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

provide  BAD  tuning  parameters  when  following  the guidelines
mentioned in this MTB.

3 MULTICS DISK MANAGEMENT

     The following sections  outline the physical characteristics
of the MULTICS  disk system, and the rational  behind its current
management.   MULTICS  disk  management   is  done  through  disk
sub-systems,   composed   of   channels,   Micro-Programmed  Disk
Controllers (Disk MPC's) and disk drives.

3.1 Disk Drives

      Each disk drive can be  dual-ported, meaning that it can be
controlled through  either of two sets  of cables, each typically
connected to  a different MPC.  This  provides redundant physical
paths to each disk drive,  and permits continued drive service if
one physical path is down.

3.2 MPC's

     Each MPC has  physical connections to one or  more IOM's and
is accessed through the use of Logical Channels.  A 451 MPC (used
for  MSU451 disk  drives) can have  two LA's  (Link Adaptors) and
thus can be connected to two different IOM's, providing redundant
paths to the MPC.  Since  MSU500/501 disk drives require a faster
transfer, each 607 MPC can have only  a single LA.  A 612 MPC can
handle both  451 and 501  drives, and has a  single LA connection
per  MPC (two  separate MPCs  exist in  the same  cabinet).  Path
redundancy  for MSU500/501  is provided  through redundant MPC's,
each typically cabled to a different IOM.

3.3 Sub-systems

     A   disk  sub-system   is  a  reflection   of  the  physical
connectability characteristics  of the MULTICS  disk hardware.  A
disk sub-system is  granted a string of disk drives  and a set of
Logical Channels.   Each Logical Channel  must be able  to access
all  drives of  the string,  thus all  disk drives  have the same
physical connectability and the logical  paths from all drives of
a sub-system are identical.

     The disk software uses any of  the channels granted to it to
access any  of the drives  granted to it.  If  the channel cannot
access the drive an error situation results, and the channel will
be removed from active service.


Multics Technical Bulletin                                MTB-673
Disk DIM

4 MR10.2 SYSTEM LIMITS

     The  following  section details  the characteristics  of the
disk control system, as seen in MR10.2 and earlier releases.

4.1 Channels

     As  of  MR10.2  a  disk sub-system  has  a  maximum software
configuration limit of 8 channels.  The channels specified to the
sub-system are  presumed to be  specified in terms of  an LA base
channel and the number of  Logical Channels configured on the LA.
This  presumes specification  in terms  of physical  paths.  This
specification  is used  to order channel  use to  evenly load all
available physical paths to the disks of the sub-system.

     The following  example defines two 451  MPC's, each of which
has two LA's.  Each MPC is  connected to two IOM's.  The PRPH and
CHNL cards supply information to  the system about the sub-system
definition for  DSKA and describe the  channel connections to it.
It would be specified in the config_file as:

     PRPH  DSKA  A  8.  4  451.  6
     CHNL  DSKA  A  12.  4  B  8.  4  B  12.  4
     MPC  MSPA  451.  A  8.  4  B  8.  4
     MPC  MSPB  451.  A  12.  4  B  12.  4

4.2 Channel Use Ordering

     The  algorithm used  to allocate channels  to the sub-system
permutes  channel numbers  to produce an  ordering which utilizes
the MPC's LA  base channels first, thus distributing  the IO load
across the  maximum number of  separate physical paths.   It does
this by  taking the index  to an array of  channels and reversing
the bits of this index to  alter the selection order.  This looks
like:

  Array Index:      000 001 010 011 100 101 110 111
  Permutes to:      000 100 010 110 001 101 011 111
                     0   4   2   6   1   5   3   7

     For  this permutation  algorithm to  function correctly, the
number of channels  allocated to a sub-system must  be a power of
two, and the number of channels on each base channel must also be
a power of 2.


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

Thus for the following prph and channel examples you get:

   prph xxx a. 8. 2
   chnl xxx b. 8. 2 a 12. 2 b 12. 2

                     0   1   2   3   4   5   6   7
   original:        A08 A09 B08 B09 A12 A13 B12 B13
   permuted:        A08 A12 B08 B12 A09 A13 B09 B13

     If channel specification  is not done in powers  of two then
incorrect  initialization occurs,  which may  not crash  a system
until  a  heavy  disk  load   requires  use  of  the  incorrectly
initialized channel entries.

   prph xxx a. 8. 3
   chnl xxx b. 8. 3

                     0   1   2   3   4   5   6   7
   original:        A08 A09 A10 B08 B09 B10 *** ***
   permuted:        A08 B09 A10 *** A09 B10 B08 ***

     In this example if we try to run 4 drives simultaineously we
will  use  an uninitialized  channel  entry and  the  system will
crash.  This same  situation will probably re-occur if  an ESD is
attempted, since we heavily load the IO system at this time too.

     This algorithm is  supposed to permute channel use  so as to
split the loading  of a sub-system between 2  or 4 Link Adaptors.
If  three  link adaptors  are  specified, or  a non-power  of two
channel  count, the  algorithm malfunctions as  above by creating
uninitialized channel table entries,  or by incorrectly assigning
load across its Link Adaptors.

4.3 Seek Overlap

     Multics uses Logical Channels to hold complete disk requests
(both  seek and  IO).  If  a drive  is requested  to perform some
action, the channel upon which this request is made is busy until
the  request  is  completed.   This  gives  a  direct correlation
between  the  number  of  Logical Channels  available  to  a disk
sub-system and the number of  drives which can simultaineously be
doing IO.  (In  fact it is only the SEEK  portion of the IO which
can  be overlapped,  but this  is a  major portion  of the delays
involved.)

     A disk  sub-system with fewer channels  than drives can only
simultaineously  move  heads  on  as  many  drives  as  there are
channels.   In  effect a  disk sub-system  with more  drives than
channels looks for the purposes  of head movement and seek delays


Multics Technical Bulletin                                MTB-673
Disk DIM

as if it had only as many  drives as there are channels, with the
drives  being proportionally  bigger.  This  reduces total system
throughput.

     This  makes  the limited  number  of channels  which  can be
allocated  to a  disk sub-system, due  to software implementation
limits, significant and a point of inefficient service.  It tends
to  limit  the size  of a  disk  string which  is allocated  to a
sub-system  and may  needlessly cause  additional hardware  to be
required to support seek overlap.

     In  addition,  one  desires  more  Logical  Channels  to  be
allocated to a sub-system than  there are drives to reduce system
degradation in situations where an MPC or a Link Adaptor is lost.

4.4 Queue Limits

     Disk sub-systems  are currently limited by  an allocation of
64  queue  slots which  are accessible  only to  that sub-system.
This limits  each sub-system to have  no more than a  total of 64
requests queued for all its drives  at any point in time.  Due to
the page write burst  characteristics of MULTICS page management,
it  is rather  easy to load  individual drives to  more than this
level,  and quite  trivial to load  an entire  sub-system to more
than this level.   When a sub-system runs out  of queue elements,
page  control  must  wait  in  a  spin-lock  situation  termed an
ALLOCATION LOCK.

     MULTICS IO  burst characteristics tend  to poorly distribute
the instantaineous load across  the disk sub-systems, thus poorly
utilizing the queue resources allocated to each sub-system.

     Optimization    of   head    motion   is    done   using   a
nearest-seek-first algorithm.   The efficiency of  this algorithm
is rather dependant  upon the number of requests  available to be
optimized,  more  requests will  produce  a shorter  average seek
length.  If a shallow queue depth is available, then optimization
is not  as efficient and  produces longer seek  lengths and lower
throughput  rates.  If  a deep  queue is  available, then shorter
seeks will occur.

4.5 Per Sub-System Work Queue Management

     The  MR10.2 implementation  allocates work  queues on  a per
sub-system  basis.  Each  sub-system has  two separate  queues, a
high  priority  queue  and  a  low  priority  queue.  On-cylinder
requests are not optimized between the two queues.


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

     Having  combined  queues  for  all  drives  on  a sub-system
produces queues with more requests  in them, and higher overheads
for   the   nearest-seek   algorithm,   but   does   not  improve
optimization, since no more requests  per drive are queued.  Thus
there is no optimization improvement from combined queues.

     Having complete separation between the high and low priority
queues prevents any possible  optimizations between these request
types and tends to create  de-optimization of all requests due to
the pre-emptive nature of its operation.

4.6 Request Service Selection

     One  of  the  aspects  of  optimization  is  how  drives are
selected  for service,  in situations  where more  than one drive
could become active.

     Current  operation  has  request   selection  made  to  keep
channels busy by examining the  sub-system work queue to pick the
drive to service  next.  The work queue, of  each priority, holds
all  the  requests  for  that  priority  for  all  drives  of the
sub-system.  The  oldest requests are  at the head  of the queue,
the youngest at the tail.   Selecting drives for service from the
oldest  unserviced requests  tends to  place emphasis  on busiest
drives.

     It  does  not  prevent  stagnation  of  requests,  since the
nearest-seek algorithm will pick the  closest seek of the current
priority to the current cylinder  position.  It does however tend
to provide unequal service according to the current queue loading
at the head of the queue.  It is dependant for its operation upon
the availability of a sub-system wide queue to determine ordering
of requests for all drives of the sub-system.

5 OPTIMIZATION TECHNIQUES

     Two common  disk optimizing strategies  are nearest-seek and
combing.

  Nearest-Seek
      This  technique  attempts to  produce the  highest possible
      disk throughput rate by presuming  that the best request to
      service  next  is  the  one  nearest  to  the  current head
      positition.   This   algorithm  attempts  to   produce  the
      smallest  average  seek  length, but  suffers  from request
      stagnation problems if the  arrival rate of requests within
      a certain band of the disk  surface is higher than they can
      be  processed.  At  this arrival rate  requests outside the


Multics Technical Bulletin                                MTB-673
Disk DIM

      band stagnate since there is  always a closer candidate for
      processing.

  Combing
      This technique  is named for  the head motion  it produces.
      It  attempts  to  guarantee  request  service  and  totally
      eliminates request stagnation.  It  starts the heads at one
      edge of the disk surface and sweeps them to the other edge,
      servicing all  requests in cylinder  order.  Requests which
      are behind the current head  position, and arrive after the
      head has passed their location, wait only until the reverse
      sweep.

     The  optimization algorithm  used in the  MR10.2 software is
the  nearest-seek  algorithm, and  it  does in  fact  suffer from
request stagnation at extremely high IO rates.  It is possible to
hang normal system operation through this stagnation.

6 ADAPTIVE OPTIMIZER

     The  following  section  details  the  modifications  of the
Adaptive  Optimizer to  alleviate the restrictions  of the MR10.2
disk  control software  and provide  an Adaptive  Optimization to
disk requests to better service the MULTICS system.

6.1 New Channel Allocation

     The  new  channel  allocation  algorithm  presumes  that the
channel  specifications  supplied  to  it  are  according  to the
premises  of a  sub-system.  Each  indicates a  real Link Adaptor
Base  channel  and  indicates  the  number  of  Logical  Channels
physically allocated on it.

     It  considers   these  channels  as   unique  resources  and
allocates  these  resources  to  the  sub-system  until  they are
exhausted.  For example:

   prph xxx a. 8. 3
   chnl xxx b. 8. 2 a 12. 4 b 12. 1

   Has the following channel resources:

   A08 A09 A10                          On one link adaptor LA0
   B08 B09                              On one link adaptor LA1
   A12 A13 A14 A15                      On one link adaptor LA2
   B12                                  On one link adaptor LA3


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

     Resource  allocation takes  the first channel  from all link
adaptors, then the next and so on, and produces an allocation of:

   A08 B08 A12 B12 A09 B09 A13 A10 A14 A15
   LA0 LA1 LA2 LA3 LA0 LA1 LA2 LA0 LA2 LA2

     This   produces   the   best   possible   loading  situation
irrespective  of  the number  or grouping  of channels.   It also
permits  intuitive understanding  of what order  channels will be
used and the loading on the Link Adaptors.

     The new  resource allocator will allocate  up to 32 channels
to a sub-system,  the current hardware maximum of  8 channels for
each of  four possible LA's  (only with 451  MPC's).  If desired,
this channel limit can be  trivially increased as future hardware
capacity is increased.  This greatly increases the previous limit
of 8 channels per sub-system.

     With  these  two  modifications the  channel  bottleneck and
channel  ordering  limitations are  alleviated.  In  addition the
channel  ordering  has  been   rationalized  to  permit  an  easy
understanding of  what will actually occur  and how channels will
be used.

6.2 New Free Queue Allocation

     The FREE QUEUE is treated  as a system-wide resource and its
members  may  be  committed  to  any  disk  drive  regardless  of
sub-system.   The free  queue modifications also  permit the FREE
QUEUE size to  be specified in the config_file  on the PARM card.
For example:

   PARM xxx xxxx xxxx DSKQ n

     Where n  can be either  octal or decimal  (terminated with a
period) and indicates  the number of elements in  the FREE QUEUE.
Limits are enforced to average no  fewer than 5 nor more than 200
elements per drive.   Thus a 10-drive system may  have between 50
and  2000  queue  elements.   If no  parameter  is  provided, the
default is 20 elements per drive.

     Observation  of running  systems indicates that  most of the
excessive  overhead  spent in  paging is  due to  ALLOCATION LOCK
waiting, which is a SPIN LOCK until a free queue element is freed
by the completion of a physical IO.  On an system overloaded with
disk requests, either  by bursts longer than the  freeq depth per
sub-system, or average request load  greater than the freeq depth
available, this paging overhead can mount to the region of 25% or
more of the  total system CPU time.  This  SPIN-LOCK is done with


Multics Technical Bulletin                                MTB-673
Disk DIM

the PAGE TABLE  LOCK held, which prevents all  other paging until
the completion of an IO  in the sub-system causing the ALLOCATION
LOCK.

     Testing   with   a  very   large   freeq,  with   the  FREEQ
modifications, has indicated that  the deeper queues available do
in  fact   produce  noticably  better   optimization  and  system
throughput and  can easily eliminate ALLOCATION  LOCKS.  In tests
of a highly loaded system, much more loaded than is possible with
the existing freeq depth available, it has been shown that paging
overheads stay in the region of  6% or less on a single processor
system doing  about 75 IO/sec  on three 451 disk  drives.  If the
system  is provided  a normal  size freeq  (64 elements)  it uses
about  35-45%   paging,  due  to  ALLOCATION   LOCKS  and  has  a
significantly reduced disk throughput.

6.3 Per Drive Work Queue Management

     The new work queue management  has a single queue per drive,
which holds only the requests for  that drive.  This will tend to
reduce optimization overheads and  speed servicing time.  It will
also  make  full  optimization  of requests  possible,  since all
requests  will be  considered in the  optimization algorithm, and
on-cylinder optimization occurs.

     Queue elements have been extended from 4 words to 6 words to
permit the addition of forward and backward linking in the queues
and for expanded fields for  sector, cylinder and queue_time.  In
addition, the  pvtx (physical volume table  index) has been added
to the queue entry to permit easier debugging.

     Since  there  is  a single  work  queue per  drive,  and all
additions  are made  to the  tail of  the work  queue, the oldest
requests are at  the head of the work queue.   This makes it easy
to determine the oldest request to prevent request stagnation.

6.4 Request Service Selection

     As  previously seen,  the MR10.2  system selects  drives for
service  by  viewing the  collection of  requests for  all drives
represented  in the  two work queues,  as an  indication of drive
loading and using this collection  to determine the best drive to
service.

     The new optimizer cannot use this selection mechanism, since
work queues  are now on  a per-drive basis, and  do not represent
sub-system  loading.   Instead  it will  round-robin  through the


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

drives to select the next requests  to keep a channel busy.  This
will provide continuous and equal service to drives.

     A point to remember is that  one should be striving for more
channels  than  drives.  In  such  a situation  there will  be no
advantage to either method, since  there will always be a channel
to be serviced  and all drives will be  kept maximally busy.  The
only  time  that  selection  order will  determine  an efficiency
difference is in busy sub-systems which have insufficient channel
allocation,   or   which   are  suffering   a   hardware  channel
degradation.

6.5 Optimization - Concepts

     The disk  system is an  extension of the real  memory of the
MULTICS system to provide its VIRTUAL MEMORY characteristics.  As
such we are not per-se looking for an optimization which provides
the maximum throughput for a  disk system, instead we are looking
for VIRTUAL MEMORY service  characteristics which elicit the best
response and throughput from the entire system.

     Nearest-seek and  disk combing techniques  will provide high
drive  throughput, with  disk combing  preventing stagnation, and
nearest-seek  perhaps having  the advantage  in higher  IO rates.
But good system  performance, as viewed by the  users, is whether
they  get  good response,  and  whether their  individual process
throughput is acceptably high.

     Two major types of IO occur within the MULTICS system:

  Blocking IO
      IO which causes a process  to wait for IO completion.  This
      IO prevents  a process from executing  (blocks the process)
      and reduces  the number of  currently executable processes.
      An example is  a page read sponsored by  a page fault.  The
      process must  stop execution until the  data it requests is
      available to permit it to continue.

  Non-blocking IO
      IO which  is essentially a background  task, which does not
      cause the system to wait for its completion.  An example is
      a page write caused by the need to evict a page from memory
      to permit  the page frame  to be used for  another page.  A
      user process does not need  to be suspended until the write
      is complete, in fact the  user process may well have logged
      out by this time.


Multics Technical Bulletin                                MTB-673
Disk DIM

     There are  many shades of  these IO forms.   VTOCE reads are
blocking IO's,  since a page  cannot be read until  the VTOCE has
been  read to  fill in  the ASTE  information.  A  VTOCE write is
non-blocking, since it does not  cause an executing process to be
suspended.  But since there are a limited number of VTOCE buffers
available, a large number of  non-blocking VTOCE writes can cause
other  processes to  be suspended  until sufficient  VTOCE buffer
space has been recovered to read a blocked VTOCE in.  In the same
way, the disk  queue can be fully allocated  to requests to write
pages, and the core map could be saturated by pages waiting to be
written.   Either situation  will cause processes  to block until
queue or core map space can be recovered by the completion of the
non-blocking IO.

     In this way even non-blocking IO can cause blocking to occur
when the applicable resource is totally consumed (over-loading of
the resource).

6.6 New Adaptive Optimization

     This paper introduces a new optimization strategy which aims
at idealized  service to the  current demands of  disk operation.
The  simplistic desire  of this system  is to  provide the lowest
possible  service  times  for  blocking  IO,  and  to  ignore the
existance  of non-blocking  IO.  However  it also  recognizes the
loading of various queue resources and takes this into account in
determining optimization.

     This optimization identifies a number of different IO types,
such as VTOCE READ, VTOCE WRITE, PAGE READ, PAGE WRITE, TEST READ
and TEST WRITE.  Each IO type is individually optimized according
to  its  current disk  loading, with  higher optimization  for IO
types which are more heavily  loaded, on a per-drive basis.  Thus
response is degraded only on  drives needing higher throughput of
lower priority IO  and only for the type of  IO which is reaching
load saturation.   It does this through  an optimization strategy
specified for each IO type for each drive.

     Optimization   is  done   utilizing  a  Nearest-Logical-Seek
algorithm,  rather  than  a  Nearest-Physical-Seek  algorithm.  A
Logical Seek  is determined by a  combination of the optimization
priority of the  request's IO type, and the  Physical Seek length
of a request.

     The  optimizing  algorithm uses  a  MULTIPLIER to  convert a
Physical Seek Length into a Logical Seek Length.  This multiplier
is  determined by  the queue  management routines  each time they
queue or de-queue a request.  Each  drive has a table of IO types


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

which   holds   optimizing  information,   used  to   derive  the
multiplier, and holds the multiplier itself.

     Optimization  is  determined according  to a  site specified
optimization policy, which states two  points on a straight line,
the Response Point and the Load Point.

  Response Point
      The Response  Point is the  point at which  the system will
      optimize for  best system response,  and is the  point, for
      each IO type,  when there is only a  single request of that
      type in the queue.  The  value stated for priority is given
      in tracks and is essentially  the multiplier to be used for
      an  IO  type for  this  single request  case, to  convert a
      Physical Seek into a Logical  Seek.  For example, to mirror
      the current  High/Low queue priority  separation, one would
      state the Response  Point value for High Priority  IO as 1,
      for Physical = Logical, and  would state the Response Point
      value for  Low Priority IO  as the number of  tracks on the
      disk.  Thus a single track  move for a Low Priority request
      looks  longer  than the  longest possible  move for  a High
      Priority  request  and  nearest-seek  optimization produces
      complete separation, of all but on-cylinder requests.

  Load Point
      The Load  Point is the  point in queue loading  for each IO
      type  at  which  it  should  become  fully  optimized.  For
      Blocking  IO,  this  would  typically state  a  point which
      preserves  sufficient  multiprogramming.   For Non-Blocking
      IO,  this  would typically  state  a point  before resource
      saturation of  that IO type  would occur, and  cause the IO
      type to become Blocking.  This  point is stated in terms of
      queue elements.

     Between these  two straight line  end-points, the optimizing
algorithm  derives  an  optimizing  multiplier  which  will  be a
combination  of the  current load of  the IO type  and the stated
policy  for the  IO type.  There  will be a  continual merging of
Logical Seek lengths  such that there will no  longer be complete
separation, and  short seeks of low  Response priority will start
to  be  taken  in  preference  to  long  seeks  of  high Response
priority.    Thus   the  simple   policy  statement   will  cause
interaction  in  a complex  manner,  dependant upon  the relative
loadings of the IO types at any particular point in time.

     Beyond the  Load Point limit, the  multiplier will remain at
1, to continue optimization at maximum throughput.


Multics Technical Bulletin                                MTB-673
Disk DIM

     The multiplier is basically  determined by the straight line
formula:

       multiplier = max (1.0, intercept - slope * load_level)

     Given that:
       At Response Point: load_level = 1, multiplier = Response
       At Load Point    : load_level = Load, multiplier = 1

     One can trivially derive from the two equations:
       Response = intercept - slope * 1
       1        = intercept - slope * Load

     That:
       Response - 1 = -slope * (1 - Load)

     and therefore:
       slope     = -(Response - 1)/(1 - Load)
       intercept = Response + slope

     This results in the restriction:
       Load Point > 1

     The  straight-line formula  uses a subtraction  of the slope
product to permit  a positive slope value to  be used, since this
will be printed in the meter.

     The optimization algorithm uses  a nearest-seek algorithm on
Logical Seek Lengths, derived from the formula:

       Logical Seek Length = Physical Seek Length * multiplier

     This calculation  is performed within innermost  loop of the
nearest-seek algorithm.  Thus the use  of a simple calculation is
highly desirable.

6.7 Optimization Policies

     A typical policy statement might be:

                    Response Point       Load point
     VTOC READ:          100                 3
     VTOC WRITE:         150                 3
     PAGE READ:          200                 5
     PAGE WRITE:       80000               100


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

     The multiplier figures are chosen with some head room on the
highest  priority IO  type, to  let other  IO's become  even more
optimized than the stated highest priority IO type as they become
saturated.  Essentially Page Write is multiplied such that at the
low  load point  even a single  track move looks  longer that the
longest possible  VTOC track move, or  even longest possible Page
Read  head move.   Thus there is  no interaction  between the two
until Page Writes start to become heavily loaded.

     This  policy indicates  that VTOCE Read  is quite important,
and will typically be done with highest priority.  It is possible
(since the  multiplier is 100)  that other IO could  be even more
highly optimized at some point in time.  Note that VTOC Read will
be  very highly  optimized if 3  are queued on  this drive, since
this may be blocking a number of eligibility slots.

     VTOCE Write is less important, since it is non-blocking, but
becomes  important  if  3  become queued,  since  this  starts to
bottleneck VTOCE operations.

     Page Read is quite important,  but less than VTOCE Read.  If
we  have 5  Page Reads  outstanding we will  be getting  out of a
multi-programming  situation  and  thus   wish  to  increase  the
priority of Page Reads.

     Page Write is least important, and since we typically have a
large  core   map  and  and  reasonable   size  disk  queues  the
optimization of Page Write does not need to occur until the queue
fills significantly.  Thus we only reach full optimization at 100
outstanding IO's per drive.

     Thus, as  long as the  disk system adequately  keeps up with
the IO  load placed upon it,  it will be as  system responsive as
possible.   As loading  starts to approach  stated thresholds the
system  moves  towards  throughput  optimization,  as  needed  to
prevent blocking system operation by loss of queued resources.

6.8 Systemic Optimization Criteria

     The optimizations  as stated above  are done on  a per-drive
basis,  but simply  specifying a  number of  per-drive Load Point
limits   will   probably   result  in   system-wide   FREE  QUEUE
over-commitment  and not  produce the desired  effect of reducing
queue saturation.

     Additional system-wide optimization  factors are maintained,
one for each IO type, which track the queue commitment to each IO
type,  and  a specified  Max  Depth tuning  factor.   The current
system-wide commitment  and the Max Depth  factor are combined to


Multics Technical Bulletin                                MTB-673
Disk DIM

produce a system-wide Load Correction Factor to be applied to the
multiplier derived from the drive loading.

     The system-wide Load Correction Factor is a fraction <= 1.0.
It is determined by the formula:

       Factor = (Max_depth - Depth)/Max_depth

     This  factor  is  calculated  for  each  request  queued  or
de-queued  for each  IO type.   It is  applied to  the Multiplier
determined from the straight-line optimizing formula as follows:

       Final_Multiplier = max (1.0, Factor * Multiplier)

     A  system load  optimizer depth  counter may  be mapped into
another  system  load  optimizer   depth  counter,  to  let  them
accumulate together.   Thus the system  wide loading will  be the
sum of the  two (or more) loads, while  the individual drive's IO
type optimization factor will be calculated from the loading only
of the one IO type on that drive.

     The effect, and flexibility, of the system load optimization
factor, as applied to the  individual IO type multipliers further
enhances  the  load  recovery  characteristics  of  the  Adaptive
Optimizer,  without   significantly  obscuring  it   purpose  and
methods.

     Since system  load depth counters can  be remapped,the depth
is calculated such  that it can never decrement  below 0, and may
be reset  to 0, to  be corrected by  the normal operation  of the
system.  Thus re-mapping of depth counters does not significantly
disrupt normal system operations and can be done on the fly.

6.9 Stagnation Management

     The  Nearest-Seek  algorithm  sufferes  from  the  flaw that
request arrival rates higher than  the maximum drive service rate
can result  in request stagnation which  can deadlock the system.
Throughput testing  on a stock  MR10.2 has already  produced this
deadlock several times, requiring an Execute Fault to recover the
system.   Since  the  Adaptive  Optimizer  utilizes  an  Adaptive
Nearest Seek Algorithm it could also suffer from thsi problem.

     The  problem  of stagnation  is dealt  with by  altering the
optimization strategy  utilized, and switching to  a Disk Combing
algorithm when  stagnation situations are  detected.  The Combing
algorithm is stagnation free, but does produce a degraded request
response time,  since no optimization differentiation  is made to


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

speed Blocking IO service.  The  stagnation time period is a site
tunable parameter.

     The  system manages  stagnation situations  by examining the
first  (oldest) element  of the work  queue for a  device it will
operate  upon.  If  the first request  has been in  the queue for
longer  than a  stagnation time  period (defaults  to 5 seconds),
then the  device will be  managed with a  disk-combing technique,
rather  than the  ADAPTIVE OPTIMIZED  NEAREST-SEEK.  Thus  when a
drive  becomes so  clogged with  requests that  it cannot service
some  of  them  within  an  acceptable  time  period,  we  switch
optimization  strategies  to  correctly  manage  this  overloaded
situation in the most efficient manner possible.  This provides a
guarantee  of  service and  continued system  operation.  Combing
continues until no requests in the  work queue are older than the
stagnation time period.

     Load testing to levels  which would previously have rendered
the  system  essentially  catatonic   merely  produce  very  slow
service, but system operation continues.

     If extreme IO  rates are maintained over a  long time period
another  syndrome  occurs, even  with stagnation  management.  In
this case IO rates are such that extremely long disk queues occur
and service times become longer than  memory lap time.  In such a
situation  a secondary  stagnation can  occur in  which processes
thrash,  even though  they are guaranteed  to get  the pages they
request.  By the time the page being referenced has been moved to
memory,  the  page holding  the instruction  which took  the page
fault has been written to disk or discarded.  During testing this
a  situation occurred  with the  Initializer such  that a console
command  to kill  the loading jobs  could not  be completed.  The
system  was recovered  by spinning  down a  drive holding process
directories,  causing  the test  processes  to be  suspended long
enough for  the Initializer to  get its command  working set into
memory.  It is not expected that this level of system loading can
be approached, or maintained, in normal service.

7 METERING DATA

     A  performance dependant  system such as  paging IO requires
good  metering, to  permit good tuning.   The Adaptive Optimizing
system   will  create   additional  queue   management  metering,
providing information  on average queue  depth, allocation counts
and high water mark of queue  depth.  This will be maintained for
the system wide FREEQ and  for the individual drive QUEUES.  This
will  permit  sufficient queue  depth  statistics to  be  kept to
provide necessary tuning information for freeq allocation.


Multics Technical Bulletin                                MTB-673
Disk DIM

     Additional  metering  information is  maintained  on channel
use,  indicating  system  utilization  of  channels  and  channel
service information.

     The  metering  system  also  provides  a  number  of service
statistics  for each  IO type,  for each  drive, indicating total
seek  length,  seek  count,  total  queue  wait  time,  and queue
allocation count,  total channel wait time  and channel use count
and comb  count.  The metering tools  utilize this information to
provide   extensive  meter   output  from   which  a   number  of
observations  can  be  directly  made,  and  further  operational
characteristics inferred.

8 SEEK LENGTH AVERAGING

     The MR10.2  SEEK LENGTH meter  is wildly inaccurate,  due to
decaying average algorithm.  The current algorithm is:

     ave = ave + divide (seek - ave, 256, 35, 18);

     This will  produce a long  term average which  will approach
the  correct value,  but takes hundreds  of iterations  to do so.
Individual long  seeks are inconsequential  and the result  is an
average which is always too high  or too low, and usually quite a
bit off.

     Adaptive Optimizer  seek metering is done  per device per IO
type  by  dividing the  integer  summation of  the  physical seek
lengths performed by the count of the number of seeks done.

     The accumulator for total seek  length per IO type per drive
is  only a  fixed bin  (35) variable,  and can  potentially cause
overflow faults  which would result  in system crashes.   Using a
maximum average seek length of  800 cylinders, a seek and service
time  of 0.050  sec and the  accumulator size results  in a worst
case overflow time of:

     9942 days = (2**35/800 tracks)/(.050 * 3600 * 24)

     Even if  the service time  were taken to  be 1 milli-second,
the overflow time would be 198 days.

9 AFFECTED PROGRAMS

     These  modifications affect  a fairly small  set of sources,
essentially  those  routines which  reference  dskdcl.incl.pl1 or
dskdcl.incl.alm:


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

References to dskdcl.incl.pl1:  (09/07/82 1300.0 mdt Tue)
   azm_display_fdump_events.pl1, device_meters.pl1,
   disk_control.pl1, disk_init.pl1, disk_meters.pl1,
   disk_queue.pl1, get_io_segs.pl1, ioi_init.pl1,
   process_dump_segments.pl1, spg_fs_info_.pl1,
   structure_library_2_.cds

References to dskdcl.incl.alm:  (09/07/82 1300.0 mdt Tue)
   dctl.alm

10 DYNAMIC TABLE ALLOCATION

     Storage allocation is done through the routine, get_io_segs,
which looks  for a DSKQ  entry on the  PARM card.  If  one is not
found, then an  allocation of 20 entries per  drive is done.  The
FREEQ is allocated with an even number of slots, to ensure double
word alignment for structures which follow it.

     The size of  the FREEQ is passed in  the disk_data structure
by get_io_segs and is used  by disk_init to create and initialize
the free queue.

     Channel  count  is also  done by  get_io_segs, and  space is
allocated   for   channel   entries.   These   entries   must  be
individually  double word  aligned, and double  word multiples in
length.

11 FREEQ ISSUES

     Several issues occur with a system wide free queue.

1. Locking
    Since the  freeq becomes a  system wide resource,  and exists
    outside  the current  sub-system lock,  it must  have its own
    lock   to  protect   multiple-access  in   a  multi-processor
    environment.  This  lock will be  held for a  very short time
    period, since  it only protects  in the order of  20 lines of
    straight-line  code.   Thus  this   is  a  spin-lock  and  is
    un-metered.

2. Run Polling
    During times that  the DIM has been called  from Page Control
    and must  check for the  completion of IO's, it  calls a Disk
    RUN routine, which scans the channels to determine completion
    of  IO's.  This  is necessary  since Page  Control runs  in a
    masked environment, and cannot recieve interrupts.


Multics Technical Bulletin                                MTB-673
Disk DIM

    With the MR10.2 software, Run Polling was performed only upon
    the channels of the current  sub-system and could only detect
    IO completion for requests on that sub-system.  With the FREE
    QUEUE  modifications, Run  Polling has been  extended to scan
    all  channels of  the system, rather  than just  those of the
    sub-system.   This  is necessary  to handle  ALLOCATIONS LOCK
    situations,  in  which the  completion of  the IO  which will
    unlock the Allocation Lock occurs  on a sub-system other than
    that which sponsored the lock.  If only the single sub-system
    were tested for IO completion, the Allocation Lock would be a
    deadly embrace.

3. Debugging
    Since the  individual free queue  elements are not  tied to a
    particular  sub-system,  the  ol_dump and  AZM  event display
    functions  cannot  find the  drive reference  left in  a free
    queue entry and turn it back to a pvtx reference to determine
    drive name.   Thus a pvtx  field has been added  to the freeq
    entry.   In addition  the time field  has been  expanded to a
    full  71  bits  from  36-bits so  no  modulus  operations are
    performed  and  it  is   much  easier  to  determine  request
    stagnation.

12 METERING TOOLS

         The  new   modifications  have  required   the  complete
    re-write of  the disk_meters command, and  the obsolecense of
    the device_meters  command.  The re-written  disk_meters will
    provide  a  great deal  more  metering information,  and will
    default to an output almost identical to the old disk_meters.
    However  control  options  will  permit  a  greatly  extended
    metering  output.   A  synopsis  of this  command  is  in the
    disk_meters.info appendix.

12.1 Meter Interpretation

         Interpretation   of   this   metering   output  provides
    considerable  information  on the  internal operation  of the
    disk system,  though much of  it is determined  by inference.
    Information  is  presented for  system-wide  data, sub-system
    data  and individual  drive data.   If the  long form  of the
    command is selected, detailed information is provided down to
    the level of individual  IO types, drive load characteristics
    and channel specific information.


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

12.2 Queue Information

         For  all  queues  the  average  depth  is  determined by
    summing  the  depth  of a  queue  prior to  insertion  of the
    current  request, and  counting the number  of requests which
    needed to be queued.  Since much  of the time drives are idle
    and do not  have a queue it is typical  to see a much smaller
    drive queue  allocation count than  the number of  IO's which
    the  drive   has  actually  done.   The   ratio  of  IO's  to
    allocations in comparison to the average IO rate of the drive
    can give an indication whether a drive is being supplied with
    infrequent large bursts  of IO, or is running  in a situation
    of steady requests.  If infrequent large bursts are typical a
    drive  will have  a relatively  high allocation  count, but a
    relatively  low IO  rate, and may  have a  high maximum queue
    depth.

         The maximum queue depth of  any queue is the peak number
    of requests which were outstanding  in that queue, or for the
    system  FREEQ,  the peak  number of  requests which  had been
    allocated at one time.

12.3 Stagnation

         The   system   information  area   also   indicates  the
    stagnation time limit.  This is a factor governing how long a
    request  may  remain in  a  drive's work  queue before  it is
    considered to  have stagnated.  Stagnation of  a disk request
    can result  in situations where there  is enough disk traffic
    in a small region of the disk to prevent requests on outlying
    areas from ever being serviced.   This can result in a locked
    up system and very poor user response.  The optimizing driver
    uses the stagnation time period  it is supplied with to alter
    its optimization strategy from one  in which fast service for
    the  majority  of requests  is  maintained, to  one  in which
    guaranteed  service  for  all  requests  is  maintained.  The
    nearest seek algorithm will sustain higher peak IO rates, but
    disk  combing will  guarantee service  to all  disk requests,
    thus assuring continued system operation.

         Stagnation  is  handled  per  disk drive  by  the simple
    expedient  of  comparing  the  oldest  disk  request  in each
    drive's  queue with  the current  calendar/clock reading.  If
    the request is more than the stagnation time old, then a disk
    combing   algorithm   is   used,   otherwise   an   optimized
    nearest-logical seek  algorithm is used.  When  the queue has
    been reduced such that the  oldest request is no longer older
    than the  stagnation time, the drive  returns to an optimized
    nearest-logical seek.   Each request processed  while combing


Multics Technical Bulletin                                MTB-673
Disk DIM

    is counted for that drive, producing the COMB count figure in
    the detailed drive statistics.

12.4 System and Drive Load Optimization

         Two  forms of  IO optimization  are in  effect.  Each IO
    type has a per-drive optimization algorithm and a system-wide
    optimization algorithm.   The system information  area of the
    meter output indicates the  current status of the system-wide
    optimization.   For each  IO type  it indicates  a maximum IO
    depth  value,  a  current  depth and  depth  mapping  and the
    optimization fraction.

         The  maximum  depth  value  is  a  tuning  factor  which
    indicates  the   maximum  system-wide  allocation   of  queue
    elements  to each  IO type  before all  drives are  forced to
    maximum throughput  optimization.  It is  used in conjunction
    with  the  current depth,  which  is the  current sum  of all
    requests of that IO type,  to produce a fractional multiplier
    by  which  to  reduce  the  ADAPTIVE  OPTIMIZER's  conversion
    multiplier.  The smaller the system-wide fraction, the higher
    the throughput optimization afforded the IO type system-wide.

         The  algorithm  permits  a  tuner  to  merge system-wide
    counters to produce different  optimization effects.  This is
    done  by   MAPPING  a  number  of   counters  into  a  single
    accumulator.

         Initially   each  IO   type  is  given   its  own  depth
    accumulator,  but  each type  is  also permitted  to indicate
    which  of  the  system-wide  accumulators is  to  be  used to
    accumulate current depth information.   This lets a number of
    IO  types  share  a  single  counter,  each  with  their  own
    max_depth values.   The result is  to artificially accentuate
    loading  for  an IO  type  which is  a prime  system response
    factor by  the loading of  a different IO  type.  For example
    one could map VTOC WRITES and PAGE READS into the same system
    wide counter, and supply a high response multiplier point for
    the  VTOC WRITES,  with a small  throughput optimization load
    point.  Then VTOC WRITES will be optimized if either they are
    loaded, or there is a high combined total of outstanding PAGE
    READS or VTOC WRITES.

12.5 Bailouts

         The system information  area of the meter is  also has a
    figure termed 'bailouts'.   This is a count of  the number of
    times that the fast ALM disk interrupt path called the slower


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

    PL1  disk interrupt  path to  process an  interrupt requiring
    some  complex  processing.   The  fast path  only  deals with
    completely straight-forward IO completions.

12.6 Run Polling

         Two  forms of  IO completion  can occur  within the DIM,
    which will result in meter output indicating:  Run Polls, Get
    IO w/o Term,  etc.  If the system receives  an IO completion,
    it will post an interrupt, and  the interrupt path of the DIM
    will  be called  to process  the completed  IO.  This  is the
    typical  path  for  IO  completion and  handles  most  of the
    traffic.

         However,  if  the DIM  has  already been  called, either
    through the CALL side, or if  a DISK RUN has been called, the
    processor  will  be running  masked,  and cannot  receive the
    interrupt indicating  completion of an  IO.  So the  RUN code
    polls  all channels  of all  subsystems to  determine if IO's
    have completed,  and to post  them.  It will  also initiate a
    new IO on the channel.

         The number  of times that  an IO completion  is seen and
    processed by RUN  is indicated in the "run  polls" figure for
    each channel.   However, if an IO  completion is processed by
    RUN, there  may still be a  resultant interrupt posted, which
    will probably show in the value "get_io w/o term", indicating
    that in  processing an interrupt, there  was no terminate bit
    on for the channel, and the interrupt was ignored.  Two other
    terminate stealing situations may occur but usually require a
    multi-processor  system  to   produce  the  appropriate  race
    conditions.


Multics Technical Bulletin                                MTB-673
Disk DIM

13 TEST PLAN

         Testing will be a multi-stage affair.  There are several
    distinct areas of test:

Test Free Queue modifications.
    This means storage allocation,  queue linking and addressing,
    esd reset of queues, locking and statistics.  This also means
    modifications to necessary meters and dump analysis tools.

Test Channel modifications.
    This  means  all  forms   of  channel  addressing  and  error
    recovery,  allocation  of wierd  channel groups  and numbers,
    cross-sub-system channel use  (since addressing has changed),
    esd  resetting  and channel  breaking and  recovery (requires
    cede  and recover  of channel  to IOI  for polts  testing and
    firmware load).   This also means  modifications to necessary
    meters and dump analysis tools.

Test queue field extensions.
    Test  modification to  queue field  definitions for increased
    addressing  ranges  for  FIPS  drives  and  modifications for
    queue-per-drive and associated definition changes.

Test adaptive optimization modifications and associated tools.

Perform  loading  tests  to  determine  response  and  throughput
    measures.

Test  disk  formatting,  firmware  loading,  error  recovery  for
    channels and drives, ESD shutdowns.

Test locking in a multi-processor environment.
    This is  necessary since the  new disk_data area's  require a
    separate lock management to that of the sub-system.

Test channel recovery and ESD.

14 TESTING ACCOMPLISHED

1.  Have  run highly  loaded tests, including  crash recovery and
    ESD in highly loaded situations with no errors.

2.  Have run disk formatting on a drive while dropping one of two
    MPC's on  the sub-system.  Dropped channels  (on crashed MPC)
    broke correctly, no IO was lost and formatting continued with
    no problems.  This was against  a background of drive loading
    jobs.  When formatting finished, and channel activity dropped


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

    to normal, we were able to reload firmware in the crashed MPC
    and recover the channels with no problems.

3.  Have  run  basic throughput  tests with  FREEQ modifications.
    Clearly demonstrate  increased performance and  no ALLOCATION
    LOCKS  are possible  with adequate queue  allocation.  In any
    situation of high IO, but no allocation locks paging overhead
    was quite low.   Any situation with even a  low percentage of
    allocation  locks,  paging overhead  was  high.  Thus  we can
    assure  sites  which  are  experiencing  loaded  systems with
    allocation lock problems of a greatly lowered paging overhead
    as a result of these modifications.

4.  Have  run  ADAPTIVE  OPTIMIZATION with  complete  sucess both
    single  and  multi-processor.   System  loads  well  and does
    function even in extremely  high loading situations.  We have
    seen one effect of stagnation where the system could lock up.
    This is due to page replacement occuring on all process pages
    before a faulted  page could be brought in.   In this case we
    recovered the  Initializer by spinning down  some PDIR drives
    and  permitting  IO to  complete for  the Initializer  in his
    PDIR's.   The problem  here was that  we were  paging out the
    Initializer's pages by  the time we could bring  in a faulted
    page (drive  service time in  combing was longer  than memory
    lap time) such that we reached a situation of deadly embrace.
    Such a situation is not  amenable to a disk control solution,
    only more memory could solve it to increase the lap time.


Multics Technical Bulletin                                MTB-673
Disk DIM

                             GLOSSARY

IOM       Input-Output  Multiplexor.   It  is  a semi-intelligent
          processor  connected to  ALL the  SCU's of  the MULTICS
          system.   It  uses  an   assigned  area  of  memory  to
          communicate  commands and  status information  with the
          MULTICS system, and an area  of memory through which to
          vector faults.   The fault and  command mailbox address
          are defined  on the IOM panel  by switch settings, each
          IOM on a system is assigned a different area for faults
          and mailboxes.

          An IOM  takes an SCU  port, thus the  mix of processors
          and  IOM's on  a system cannot  total more  than 8 (4MW
          SCU).

LA        Link Adaptor.  It is the physical connection between an
          IOM and an  MPC and consists of a set  of boards in the
          MPC cabinet.  The IOM has  another set of boards (PSIA)
          in its  cabinet to connect  the other end  of the cable
          joining the LA and the IOM.

Logical Channel
          Unlike    a    Physical    Channel,    which    is    a
          physical/electrical object, a Logical  Channel is not a
          physical  entity.  It  is a  logical sub-division  of a
          Physical  Channel which  can hold  a request.   The IOM
          micro-program scans all its  mailboxes for requests for
          Logical  Channels  and  posts these  down  the Physical
          Channel.   Replys from  the MPC  will be  tagged with a
          Logical Channel  number and the results  posted back to
          the IOM.   Thus a Logical Channel  holds requests to be
          performed by an  MPC and acts like a  buffer area for a
          Physical Channel.

MPC       Micro-Programmed Controller.  It  is a semi-intelligent
          processor  connected  to  an  IOM  which  controls  the
          operation  of  the disk  drives cabled  to it.   It can
          accept multiple  requests, one per  Logical Channel and
          perform  full seek  overlap on  all its  connected disk
          drives.  A  451 MPC can  have two LAs, a  607 MPC (used
          for MSU500/501) disk drives has a single LA.

Physcial Channel
          A Physical  Channel is the  physical connection between
          an MPC  or FNP and an  IOM.  It is a  set of two cables
          terminated on one  end in the MPC and  the other end in
          the IOM.  It carries all electrical signals between the
          two devices.


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

PSIA      Peripheral Serial Interface Adaptor.   This is the name
          for a high speed connection  in the IOM, typically used
          for disk or tape MPC's.

          Each PSIA transforms the physical channel it terminates
          into  a set  of Logical Channels.   The channel numbers
          and  the number  of channels addressable  on the single
          physical  channel  is determined  by resistor  packs or
          switches  on the  PSIA boards.   These define  the Base
          Channel  number  and  indicate  the  number  of Logical
          Channels which follow it.  Base Channel numbers must be
          a  multiple of  the number  of channels  defined on the
          Base  Channel  in  power-of-two multiples.   Thus  if a
          physical channel has one  logical channel, it can occur
          on  any boundary.   A physical  with two  logicals must
          occur  on  an  even address,  a  physical with  3  or 4
          channels must  be on a  quad boundary, a  physical with
          5,6,7 or 8 channels must be on an octal boundary.

SCU       System  Control  Unit.   This  is a  system  and memory
          controller.  The current 4MW  controller also houses up
          to 4 million  words of memory in the  same cabinet.  An
          SCU  provides  multi-ported  access  to  the  memory it
          controls to all ports connected  on it.  Each SCU has 8
          port slots, each processor takes a slot, each IOM takes
          a slot.  Each SCU is  cabled to ALL processors and IOMs
          of the system,  and they should all have  the same port
          numbers on all SCUs.


Multics Technical Bulletin                                MTB-673
Disk DIM

                  Appendix A - disk_meters.info

06/08/84  disk_meters, dskm

Syntax:  dskm sub_sys1 sub_sys2 {-control_args}

Function:  Prints metering output from MULTICS disk management.

Arguments:
sub_sysn
   Multiple  position-independant  sub-system  identifiers may be
   specified  to  select  specific sub-system information.  If no
   sub-system  identifiers  are  supplied,  all  sub-systems  are
   listed.  A sub-system identifier takes the form of:

      dskm dska

Control Arguments:
-channels, -chn
   Requests  sub-system channel information.  This will be of the
   form:

          dska: Channel information.
          A8 : 42530 connects, 163 int w/o term, 385 run polls
          A12: 4559 connects, 13 int w/o term, 129 run polls
          A9 : 1307 connects, 15 int w/o term, 225 run polls
          A13: 86 connects

   connects         - Number of channel connections made.
   int w/o term     - Number of interrupts without terminate
                      status.
   run polls        - Number of IO's seen by RUN polling.
   get_io w/o term  - Number of io_manager calls not returning
                      terminate.
   term not active  - Number of interrupts with terminate on an
                      inactive channel.
   IOI              - The channel has been released to IOI use.
   INOP             - The channel is deemed to be inoperative.
   BROKEN           - The channel is deemed to be broken.

-detail, -dtl
   Requests detailed printout of drive information.  It is of the
   form:


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

 dska_04 #Seeks AveSeek Queue-wait Channel-wait Queued Multiplier
  PageRd  11338   70.38       44.5  0.8%   27.3      0      119.8
  PageWt   1482   51.98     1153.7  0.1%   32.4      0    50239.2
  VtocRd   1712   90.71       26.3  0.1%   24.4      0       23.8
  VtocWt   1518   89.38       26.4  0.1%   20.8      0       49.9
  TEST    0 UNLOADs,  39 TESTs
 Channels 1.05% busy,  212 Combs.

-long, -lg
   Requests all of -dtl, -chn, -q, -sys.

-queues, -q
   Requests inclusion of drive queue information, of the form:

 dska_04 Queue: Ave 16.1, Alloc 99, Max Depth 50/280, Cur Depth 0

   This  indicates  the  average  queue  depth  for the specified
   number   of   queue   allocations,  the  maximum  depth  since
   max_depth_meters  were last reset and the current depth in the
   queue.   Requests are only queued if a drive is busy and/or it
   already has requests queued.

-report_reset, -rr
   Requests  normal  statistics  to  be printed, according to the
   other  control  arguments, and then meters to be reset to this
   point in time (see reset).

-unreset, -urs
   Requests  that  disk_meters  reset its meters to boot time, by
   releasing its temporary meters segment.

-reset, -rs
   Requests  that  disk_meters  reset its meters to this point in
   time,  and  not  print statistics.  A reset is accomplished by
   making  a  copy of the statistics as of the reset time; future
   invocations of the command will display the difference between
   current statistics and the copy.

-system, -sys
   Requests  that system statistics and optimizing information be
   printed, in the form:

  FREE Queue: Ave 6.2, Alloc 60237, Max Depth 99/280, Cur Depth 1
System Factors: stagnate time 5.000 seconds, 378 bail outs.
          Maximum Depth Meters reset at: 06/07/84  2207.7 mdt Thu
    PageRd   Max Load    6, Depth   0 (PageRd), Fraction 1.0000
    PageWt   Max Load  210, Depth   0 (PageWt), Fraction 1.0000
    VtocRd   Max Load    6, Depth   0 (VtocRd), Fraction 1.0000
    VtocWt   Max Load   12, Depth   0 (VtocWt), Fraction 1.0000


Multics Technical Bulletin                                MTB-673
Disk DIM

   This  indicates  FREE  Queue use, stagnation time beyond which
   the  system does disk combing and the number of times that the
   ALM  driver  had  to  call  the  PL1 driver to process complex
   interrupt  information.   The  time that max_depth meters were
   last  reset  at  is  given,  as  is  the current status of the
   system-wide load optimization algorithm.

Default Information:
   The default invokation of disk_meters will provide information
   like:

                    Metering Time:     11:23:01

Subsystem  dski:
                Locks   Waits    %Calls  Average      %CPU
  Call Lock:    11768       0   0.0000%    0.000   0.00000%
  Int Lock:     11764       0   0.0000%    0.000   0.00000%
  Run Lock:      3091       0   0.0000%    0.000   0.00000%
  Alloc Lock:   11755       3   0.0255%   12.223   0.00009%

     Drive  Reads    Writes      Seek     ATB       ATB      ATB
                               Distance  Reads    Writes     I/O

      1      3064      1831        68    13375     22382    8372
      2       729        36         9    56216   1138375   53570
      3      3065      1934        77    13370     21190    8197
      4       922        26        11    44448   1576212   43229

   This  indicates  the  metering period, the sub-system and lock
   information  for  the  sub-system,  and  individual  drive  IO
   information  for  all  drives  which  have performed IO in the
   metering   period.   Typically  0  counts  are  suppressed  to
   highlight useful information.


Multics Technical Bulletin                                MTB-673
Disk DIM

                   Appendix B - tune_disk.info

08/16/84  tune_disk, td

Syntax:   tune_disk drive_name io_type {-load | -ld} n
               {-response | -rsp} m
          tune_disk reset_max
          tune_disk reset_sys
          tune_disk stagnate seconds
          tune_disk system io_type {-max n} {-map io_type}

Function:  Permits a user with hphcs_ access to alter disk tuning
parameters.

Arguments:
io_type
   An  io_type  is the name of a type of IO tunable by tune_disk.
   If  tune_disk  is  envoked  without  arguments it will print a
   usage message which includes the valid io_type names.

drive_name
   Is  the  name  of  a disk drive to be tuned.  Drive names must
   begin  with  the three characters "dsk", followed by a letter,
   an underline and one or two numeric digits.

-load n, -ld n
   This  argument  pair  defines  the  optimization maximum queue
   loadpoint  for  the  specified  drive.   It  is one of the two
   points  which  define  the  optimization  line.  If -load 1 is
   specified,  the  initial  response  value  is  the  optimizing
   multiplier and no load optimization is performed.

-response m, -rsp m
   This  argument  pair defines the optimization maximum response
   value, which is the multiplier to be used for an IO type queue
   load of a single request.

reset_max
   This  argument requests that all queue maximum depth meters be
   reset  in  the  disk_seg database.  The time and date at which
   the meters were last reset is also maintained in the database.
   This  argument is useful to permit a new/lower max depth to be
   seen  after altering tuning parameters, or after an Allocation
   Lock has occurred.


MTB-673                                Multics Technical Bulletin
                                                         Disk DIM

reset_sys
   This argument requests that all system depth counters be reset
   to  0.   This  is  useful  after altering system depth counter
   mapping.   If  counter mapping has been changed while requests
   were in the queue, the counter which had been used may be left
   artificially  high.   Resetting  back  to  0  lets  the system
   correct the value.

stagnate seconds
   This  argument  pair  specifies  a  change  of the system wide
   stagnation  time  period  to  the specified number of seconds.
   Tune_disk sets a maximum stagnation time period of 6 minutes.

system
   This   argument   indicates   modification  of  a  system-wide
   optimization factor.  The maximum depth and/or mapping for the
   specified io_type will be altered.  If neither a maximum depth
   value, nor a mapping is altered an error message is issued.

-map io_type
   This  argument  specifies  that the current depth counting for
   the  specified  system-wide  optimization entry should be done
   using the counter for io_type.  For example:

      tune_disk system PageRead -map PageWrite

   Would  have the depth counter for PageWrite used to accumulate
   the number of PageRead IO's currently outstanding.

-max n
   This  argument  pair  indicates that the maximum depth for the
   specified  system-wide  optimization entry should be set to n.
   If  this  depth  is  reached then full optimization of this IO
   type will be done system wide for all drives.

Notes:
   Optimization  is  performed  by determining a multiplier to be
   used  to  convert  a  Phsical  Seek Length into a Logical Seek
   Length,  for  the  purposes of determining the Nearest Logical
   Seek   to  perform  on  a  disk  drive.   The  Response  Point
   determines  what  this  multiplier  is  for a situation with a
   single  request  of  that  IO  type  in  the queue, and is the
   multiplier required to produce best system response.  The Load
   Point  specifies the number of requests permitted in the queue
   of  the  specified  IO  type  before full optimization occurs,
   Logical  Seek Length = Physical Seek Length.  These two values
   define the two endpoints of a straight line.  The optimization


Multics Technical Bulletin                                MTB-673
Disk DIM

   multiplier  is determined by the current load of the queue and
   its corresponding position on the straight line.

   System-wide   queue  loading  optimization  is  determined  by
   looking  at the system-wide load of an IO type and the maximum
   depth  it should be permitted before becoming fully optimized.
   The fraction produced by:

      fraction = max (0.0, (max_depth - depth)/max_depth)

   is  used to alter the individual drive's IO type multiplier to
   determine  the  system-wide queue loading effect on individual
   drive optimization.

   The  system-wide  optimization  utilizes a max_depth specified
   for  the  IO  type,  and  a  counter  of  the current depth to
   determine   the   system-wide   loading  optimization.   Depth
   counters   can   be  mapped  together  to  form  an  aggregate
   system-wide queue loading effect.  When decrementing, counters
   are  not  permitted to become negative, but if re-mapped while
   non-zero  they  may remain > 0 with no load.  The tuning tools
   permit  resetting  the  current depth counters for system-wide
   optimization  back  to  0, to let the system correct them to a
   true load indication.

   All  queues  have  a high-water-mark accumulator.  This can be
   reset through the tuning tools to permit a new high-water-mark
   to be determined.
Home | History | People | Library | Sites | About | Site Map | Changes