MULTICS TECHNICAL BULLETIN 609 To: Distribution From: E. J. Wallman Date: January 26, 1983 Subject: More on improvements to the Multics search facility ABSTRACT In MTB-565, Benson Margulies pointed out two major deficiencies in the Multics search facility (search_paths_, etal.), and proposed a technology and implementation to overcome those deficiencies. This paper presents an alternative to the proposal in MTB-565. Please address comments to the author: via Multics mail: Wallman.Multics at System M via forum meeting: >udd>Multics>C&F>C&F_Maintenance (C&F) on System M via hardcopy mail: Ed Wallman, MS Z-30 (AZ05) PO Box 8000, Phoneix, AZ 85066 via telephone: (602) 862-6637 (HVN 357) _____________________________________________________________________ Multics Project internal working documentation. Not to be reproduced or distributed outside the Multics Project. page 2 MTB-609 INTRODUCTION In MTB-565, Margulies points out two major search facility deficiencies: 1) Manipulation of a search list with the *_search_paths com- mands is clumsy and tedious due to the necessity to reference existing paths exactly as they are recorded in the the list (and displayed by print_search_paths). 2) There is no feature that provides verification that a path to be entered into a search list is valid for the purpose of the list. For example, some lists may need to preclude the use of -referencing_dir, others may require that only directories be searched, and still others may require only segments with specific suffices (eg., the dictionaries). This MTB addresses each of these points in greater detail, answering the first completely and the second in part. Manipulation of Search Lists There are three different objects that can be entered into a search list: o keywords Keywords present no problem. Since they are few in number and completely determined by the *_search_paths commands, their representation in the search list can be coerced to be their unique long forms. Thus, they need no further identi- fication. o storage system objects Similarly, storage system objects present no problem. Since they exist, the storage system can be queried for their UIDs and the UIDs used to determine if the random pathnames returned by expand_pathname_ (or the use of different addnames by the user) do indeed refer to the same real object. There is a 36 bit pad word in the sl_info structure that can be used for UIDs compatibly. Since the search list structure is to be changed for support of path verification (see discussion following), a UID datum can be added to that structure. Note that the search list is maintained soley by the search_paths_ entrypoints so that any structure changes are transparent to callers using the search facility for searching. This approach suffers one flaw: if the owner of a storage system object recreates it (thus changing the UID) during the life of a process, the UID recorded in that process's search segment becomes invalid and references to the path will have to revert to the "exact path" techinque. MTB-609 page 3 o nonobjects The creation of pseudo-UIDs for nonobjects (in this MTB, a "nonobject" is anything that is not known to the storage system; examples are pathnames containing nonexistent entrynames, archive component pathnames, and FNP channel numbers) appears to be an insoluble problem. For example, consider a search list that contains the paths >udd>a>b>foo and >exl>a>b>foo, where both "b" directories exist and neither "foo" exists. If a user has >udd>a>b as working directory and references "foo", get_wdir_ could quite easily return >user_dir_dir>a>b (since it, like expand_pathname_, gets paths from the pathname associative store). Which of the nonexistent "foo"s is being referenced? Several possibilities have been suggested: 1) Refuse to enter nonobjects into the search list. This is the solution enforced by set_search_rules and "solves" the problem by avoiding it. However, this solution is unacceptable for search lists (even for the case of nonexistent entrynames) since the search facility is intended to be able to search for "anything" that an application using may want to search for. If all objects not known to the storage system were ruled out, the basic intent of the search facility would be subverted. 2) Record the UID for each real entryname in the search path in the process search segment. While feasible, this solution is very unattractive because of the amount of implementation work needed and the high cost in perfor- mance due to the addition of more arrays with refer extents. Further, this makes the problem of search path referencing even more obscure since all names appearing after the last real entryname would still have to be given exactly and the point in the pathname where this need arises is not visible to the user. Someday, it may be desirable to add starname support to the search facility. If this happens, the number of nonobjects appearing in search lists will escalate drastically. 3) Provide a "regular expression" feature like the text editors. This is a feasible solution and far less costly than 2). If the user types, say, "dsp list /.*foo/", the program will scan the search list for hits and query for action on each hit. However, the proper action in response to "asp list bar -after /.*foo/" is not clear. 4) Do nothing. Although it retains the current undesirable behavior of path manipulation with regard to nonobjects, this is the expedient solution. Indeed, the user will penalized for inaccurate typing of paths, but such penalization is certainly not unknown in the Multics command environment. page 4 MTB-609 Thus, since there seem to be no criteria for determining the unique identity of nonobjects, we choose the last alternative and do nothing. Implementation The support of UIDs for storage system objects can be implemented with the following trivial change in the sl_info structure and the addition of a UID datum to the structure of a search list in the process search segment (see discussion of path verification follow- ing): /* Modified 12/21/82 by Ed Wallman: Changed sl_info.paths.pad2 to sl_info.paths.uid in support of relative pathnames. This is a compatible change; no new version number. */ dcl 1 sl_info aligned based (sl_info_p), 2 version fixed bin, /* Must be 1 */ 2 num_paths fixed bin, /* Number of search paths */ 2 change_index_p /* Pointer to list's update count */ pointer, /* IN THE PROCESS SEARCH SEG */ 2 change_index /* This list's update count */ fixed bin (71), 2 pad1 (6) bit (36), /* Must be zero */ 2 paths (sl_info_num_paths refer (sl_info.num_paths)), 3 type fixed bin, /* Type of search path */ 3 code fixed bin (35),/* Standard status code for path */ 3 uid bit (36) aligned,/* Unique id of path */ 3 pathname char (168) unaligned;/* Search pathname */ Verification of Search Paths This is the more serious of the two deficiencies and deserves the more work. If a user enters an inappropriate path, for example, a directory named "foo.dict" into the dictionary search list, there will no notice given until the dictionary commands attempt to use the path. That subsequent notice can be quite abrupt and unfriendly. For the example given, the error is: Error: This operation is not allowed for a directory. >udd>... (by >unb>find_dictionary_word ...) etc. ... followed by failure of the dictionary command to proceed through the rest of the search list. MTB-609 page 5 Thus, two new needed features are suggested:(1) 1) control over the type of path to be entered This can be supported by the addition of a word of switch bits with specific bit positions assigned to the various allowable objects. 2) a list of allowed/required suffices This list is clearly needed. Currently, there are two lists defined in search_list_defaults_ (dictionary and declare) that require segments with a specific suffix. Where to Store the New data? These new data must be kept in the default search list since that is the only place that allows the control to be exercised for the entire system. It cannot be kept in sl_info because the storage for that structure is destroyed after each invocation of a *_search_path command; it cannot be kept in the process search list because that would require the sharing of active search segments, an approach that does not seem feasible. _____________________________________________________________________ (1) This MTB addresses only control over valid keywords, valid storage system objects, and suffixes on storage system objects. Control over nonobjects and the contents of storage system objects, although addressed by Margulies, are not covered here; they are left for the next round of search facility enhancements. Note that the switch bit word is sufficiently powerful to encompass a great number of different objects, and that the suffix technology proposed could be modified into a much more general "name type" technology. In particular, Margulies suggests that an entry point into LISTNAME_sl_ could be designed to provide verification of the contents of a search object. For example, the dictionary commands can only make effective use of non-empty keyed files, a list for searching for a dial_out channel should contain only FNP channel id strings, certain search archives could be required to contain components named, say, "frobus_*.*", etc. However, since the dictionary commands (the premier candidate as caller of this new interface) currently does no internal checking on its own behalf, creation of the interface is not judged to have a high priority. (As evidence of the lack of checking, simply create an empty MSF or SSF named foo.dict, add that path to your dictionary search list, and use "find_dictionary_word barfo". The command will blandly tell you it could not find the word!) page 6 MTB-609 Margulies suggests that the data be kept and enforced by a new per-list support module, LISTNAME_sl_. Although feasible and techni- cally correct, this approach is unattractive because it requires the creation and maintenance of several new system modules. How to Support ad hoc Search Lists? The next question is that of support for path verification for search lists that are not defined in search_list_defaults_. If a developer wants to create a new or replacement search list, s/he need merely replicate the code found in search_list_defaults_.cds for the new list. However, if the new default list is a replacement for a list that is already defined in search_list_defaults_ or is in a directory not in the user's current search rules, the search facility cannot find it! It uses the dynamic linker (without -referencing_dir) to locate the default list ONLY after it fails to find the list in search_list_defaults_. Even then, it will not be found unless the user has taken overt action (like initiating or setting a link in wdir) to make it findable by the linker. This shortcoming can be overcome by providing a new control argument, -default_pathname (-dpn), for (set add)_search_paths. The usage would be: ssp list -dft -dpn path>list.search asp list foo {-modifiers} -dpn path>list.search The presence of this path in the command line would override the prior use of search_list_defaults_. Note that this proposal does not support verification for dynamically created lists, that is, lists that are created with set_search_paths but dont have a list.search anywhere.(1) Implementation Since the details of the search structure must now be known to both search_paths_ and the *_search_paths commands, the declaration of the structure has been taken out of search_paths_.pl1 and made into a new include file. That include file is presented below. Further, the code in <list>_default.cds must be significantly expanded. Excerpts from a modified search_list_defaults_.cds are also shown below. There are no new system modules, no new entrypoints, and no changes to any of the interfaces to search_paths_; the only changes _____________________________________________________________________ (1) Such support could be added with more new control arguments, but the complexity of the implementation doesnt seem justified by the expected utility of the feature. MTB-609 page 7 are to the actions performed by those various entrypoints. Thus, all callers of those search_paths_ entrypoints need no work. Compatibility and Flag Days Although there are no changes in the interfaces involving in searching, there are in those involving management of search lists. The version of the process search segment will change and the default search lists will acquire a brand new version number. The first version change poses no problems since the search facility is already able to cope with a dynamic change in search segment version (there is a flaw in that the process search segment will be recreated with defaults instead of the paths that were in the existing search segment, but the problem is thought to be minor since dynamic version switching of search facility versions is not a usual user action). The second version change has an eventual flag day. To minimize the trauma of that day, code has been added (and clearly delimited for ease of removal) to support both version SRCH0001 and the obsolete un-version and a warning will be issued when an un-version default search list is used (this warning will be deactivated in the EXL installation). Thus, users will get at least one release of noisome warning messages to clean up their default search list act before the obsolete version support is removed. page 8 MTB-609 /* BEGIN search_segment.incl.pl1 */ /* search segment data structures Created 12/31/82 by Ed Wallman with declarations taken from search_paths_.pl1 Modified 12/31/82 by Ed Wallman: Added list control feature - version 3 of search_seg and version SRCH0001 of default_search_list. The list_header must remain in the same generation of storage because users are given pointers to list_header.update_count */ /* format: style2,ind3,ll80,dclind4,idind16,comcol41,linecom */ dcl 1 search_seg based (search_seg_ptr), 2 header, 3 version fixed binary, /* search segment version */ 3 first_list_header_off /* offset of first list's list_header */ offset (search_seg.area), 3 last_list_header_off /* offset of last list's list_header */ offset (search_seg.area), 2 area area (sys_info$max_seg_size - divide (length (unspec (search_seg.header)), 36, 19)) ; dcl 1 list_header based (search_list_header_ptr), 2 link /* offset of next list's list_header */ offset (search_seg.area), 2 back_link /* offset of last list's list_header */ offset (search_seg.area), 2 list_control_off /* offset of list_control */ offset (search_seg.area), 2 list_name_off /* offset of list_name */ offset (search_seg.area), 2 search_list_off /* offset of search_list */ offset (search_seg.area), 2 update_count /* number of search list updates */ fixed binary (71); dcl 1 list_control based (search_list_control_ptr), 2 sws bit (36), /* allowed keyword switches */ 2 suffix_count /* number of required suffixes */ fixed bin, 2 suffixes (list_control_suffix_count refer (list_control.suffix_count)) char (32); dcl 1 list_name based (search_list_name_ptr), 2 name_count fixed binary, /* number of synonyms */ 2 names (list_name_name_count refer (list_name.name_count)) char (32); /* search list names */ dcl 1 search_list based (search_list_ptr), 2 path_count fixed binary, /* number of search paths */ 2 paths (search_list_path_count refer (search_list.path_count)) MTB-609 page 9 , 3 type fixed binary, /* search path type */ /* FS UID, if any */ 3 uid bit (36) aligned, 3 pathname char (168), /* search pathname */ 3 mark bit (1); /* temporary */ dcl 1 default_search_list based (default_search_list_ptr), 2 version char (8), /* structure version id */ 2 control_sws bit (36), /* list control switches */ 2 name_count fixed binary, /* number of synonyms */ 2 path_count fixed binary, /* number of search paths */ 2 suffix_count fixed bin, /* number of suffixes */ 2 names (0 refer (default_search_list.name_count)) char (32), 2 suffixes (0 refer (default_search_list.suffix_count)) char (32), 2 paths (0 refer (default_search_list.path_count)), 3 type fixed binary, /* search path type */ 3 pathname char (168); /* search pathname */ dcl 1 v0_default_search_list /* OLD (V0) DEFAULT SEARCH LIST */ based (default_search_list_ptr), 2 name_count fixed binary, /* number of synonyms */ 2 path_count fixed binary, /* number of search paths */ 2 names (0 refer (v0_default_search_list.name_count)) char (32), 2 paths (0 refer (v0_default_search_list.path_count)), 3 type fixed binary, /* search path type */ 3 pathname char (168); /* search pathname */ dcl search_seg_ptr ptr init (null); /* pointer to the search segment */ dcl search_seg_version_3 fixed binary internal static options (constant) initial (3); dcl search_list_header_ptr /* pointer to a list header */ ptr init (null); dcl search_list_control_ptr /* pointer to a list control structure */ ptr init (null); dcl list_control_suffix_count fixed bin init (0); dcl search_list_name_ptr /* pointer to a list name structure */ ptr init (null); dcl list_name_name_count fixed bin init (0); dcl search_list_ptr /* pointer to a search list */ ptr init (null); dcl search_list_path_count fixed bin init (0); dcl default_search_list_ptr /* pointer to a default search list */ ptr init (null); dcl DEFAULT_SEARCH_LIST_VERSION_1 page 10 MTB-609 char (8) int static options (constant) init ("SRCH0001"); dcl RDIR_SW /* ref dir keyword switch bit */ bit (36) init ("400000000000"b3); dcl WDIR_SW /* working dir keyword switch bit */ bit (36) init ("200000000000"b3); dcl PDIR_SW /* process dir keyword switch bit */ bit (36) init ("100000000000"b3); dcl HDIR_SW /* home dir keyword switch bit */ bit (36) init ("040000000000"b3); dcl DIR_PATH_SW /* directory pathname switch bit */ bit (36) init ("020000000000"b3); dcl SEG_PATH_SW /* segment pathname switch bit */ bit (36) init ("010000000000"b3); dcl MSF_PATH_SW /* MSF pathname switch bit */ bit (36) init ("004000000000"b3); dcl KEYS_OK /* keywords allowed */ bit (36) init ("740000000000"b3); dcl KEYS_DIRS_OK /* keywords and directories allowed */ bit (36) init ("760000000000"b3); /* END search_segment.incl.pl1 */ Modified 01/01/83 by Ed Wallman: Added info for list control feature. */ search_list_defaults_: /* exerpts ONLY */ procedure; /* automatic */ dcl 1 lists aligned, 2 comp bit (0) aligned, 2 compose, 3 version char (8), 3 control_sws bit (36), 3 name_count fixed binary, 3 path_count fixed binary, 3 suffix_count fixed bin, 3 names (2) char (32), 3 suffixes (1) char (32), 3 paths (4) like search_path, 2 dcl bit (0) aligned, 2 declare, 3 version char (8), 3 control_sws bit (36), 3 name_count fixed binary, 3 path_count fixed binary, 3 suffix_count fixed bin, 3 names (2) char (32), 3 suffixes (1) char (32), MTB-609 page 11 3 paths (1) like search_path, 2 dict bit (0) aligned, 2 dictionary, 3 version char (8), 3 control_sws bit (36), 3 name_count fixed binary, 3 path_count fixed binary, 3 suffix_count fixed bin, 3 names (2) char (32), 3 suffixes (1) char (32), 3 paths (1) like search_path, ... /* program */ lists.compose.version = DEFAULT_SEARCH_LIST_VERSION_1; lists.compose.control_sws = KEYS_DIRS_OK; lists.compose.name_count = hbound (lists.compose.names, 1); lists.compose.path_count = hbound (lists.compose.paths, 1); lists.compose.suffix_count = hbound (lists.compose.suffixes, 1); lists.compose.suffixes (*) = ""; lists.compose.names (1) = "compose"; lists.compose.names (2) = "comp"; lists.compose.paths (1).type = WORKING_DIR; lists.compose.paths (1).pathname = "-working_dir"; lists.compose.paths (2).type = UNEXPANDED_PATH; lists.compose.paths (2).pathname = ">udd>[user project]>compose_macros"; lists.compose.paths (3).type = REFERENCING_DIR; lists.compose.paths (3).pathname = "-referencing_dir"; lists.compose.paths (4).type = ABSOLUTE_PATH; lists.compose.paths (4).pathname = ">unb"; lists.declare.version = DEFAULT_SEARCH_LIST_VERSION_1; lists.declare.control_sws = SEG_PATH_SW; lists.declare.name_count = hbound (lists.declare.names, 1); lists.declare.path_count = hbound (lists.declare.paths, 1); lists.declare.suffix_count = hbound (lists.declare.suffixes, 1); lists.declare.suffixes (1) = ".dcl"; lists.declare.names (1) = "declare"; lists.declare.names (2) = "dcl"; lists.declare.paths (1).type = ABSOLUTE_PATH; lists.declare.paths (1).pathname = ">sss>pl1.dcl"; lists.dictionary.version = DEFAULT_SEARCH_LIST_VERSION_1; lists.dictionary.control_sws = SEG_PATH_SW; lists.dictionary.name_count = hbound (lists.dictionary.names, 1); lists.dictionary.path_count = hbound (lists.dictionary.paths, 1); lists.dictionary.suffix_count = hbound (lists.dictionary.suffixes, 1); lists.dictionary.suffixes (1) = ".dict"; lists.dictionary.names (1) = "dictionary"; lists.dictionary.names (2) = "dict"; lists.dictionary.paths (1).type = ABSOLUTE_PATH; lists.dictionary.paths (1).pathname = ">unb>standard.dict"; page 12 MTB-609 ... CONCLUSION Finally, the concepts discussed have been implemented and are ready for exposure in EXL when it is deemed advisable to do so.