This note describes the history of expandfile, a simple Unix command line program for expanding templates.
I wrote expandfile in 2002 to replace a collection of ad hoc Perl programs that I had been using to maintain the multicians.org website since 1995. Since 2002, I've added features to expandfile occasionally.
The idea of a simple macro expander dates back to Christopher Strachey's GPM macro expander, which I had experimented with in the mid 60s on CTSS. Later, I used several computer languages that used pre-processors to add features to their languages, such as ALM, PL/I, and C, which provided features such as including other input files.
Because multicians.org had mirror sites where I could not modify the web server configuration or execute code on the server, I could not dynamically generate web pages. Furthermore, at the time, people implementing web sites by generating pages dynamically from databases were complaining about web server performance and security problems. I chose to create static HTML pages and have the web server serve them without run-time calculation. (I had read a little about PHP, but at that time it was considered just a browser enhancement for FORM processing.)
Some web servers supported a "server side include" feature, which let users insert the contents of auxiliary files when serving a web page. When I tried to separate my pages into content and boilerplate, I found that I wanted to have the boilerplate be slightly different for each page, so I sought a method where
- Each "object" web page was created from a "source" file
- Source files were stored in the file system on my computer
- Object files were stored in the file system on server computers and were HTML ready to serve
- If the same fact was presented in many object files, it would come from a single source file
RUNOFF is an archetype text-transformation language system, and I had used it as well in CTSS days. RUNOFF's input is either text to be copied to the output, or commands that change the state of execution and affect how later processing works. Later implementations of RUNOFF-like languages in Multics and elsewhere included the ideas of macro execution and multi-pass processing.
I decided to write a source text expansion program that did not parse the underlying input language, similar to GPM: it just transformed text strings into other text strings, with a minimal way of defining macros. This made the program more general and freed it from dependency on the syntax of the underlying language; I didn't have to write or maintain an HTML parser. As in GPM and RUNOFF, I could set and evaluate string variables, and employ macros that accepted string variables as arguments.
I studied the errors I made often when maintaining multicians.org, and tried to build tools that would prevent them. Having each fact in one place was good only if I made sure to regenerate all object pages that used that fact: this led me to use the make program. Another common error was forgetting to update server files when I modified a file on my computer: this suggested the use of rsync.
Using Expandfile to Generate Web Sites
I first used expandfile to translate "HTML with extensions" input (which I called HTMX) into HTML, mostly to include common boilerplate used on all my web pages. I added features to allow variables in the header and footer data, like "title" and "date updated." Adding builtin functions that could transform variables' values came next, then the ability to capture the output of external shell commands, and then integration with SQL. I used these steps to simplify my work flow for maintaining websites I created, and to eliminate special-purpose Perl programs in favor of logic in web page templates.
The big advance for me was introducing the *block builtin, and the pattern of writing HTMX files that
- Set parameter variables, define a body block that includes variable expansions, and possibly define other blocks.
- At the end, *include a wrapper file that outputs headers, *expands the body and then outputs footers. (The header and footer can expand the parameter variables to set page titles and so on.)
This pattern separates site boilerplate from page content, provides an independent source file for each HTML page, and makes it easy to regenerate a single page.
Using expandfile was valuable when I made global changes to every page on a site to change each page's appearance, to conform to changing HTML specifications, or to use new browser features.
Connecting expandfile to a local MySQL database and supporting *sqlloop was the next big step. This provided consistent formatting for lists of people, publications, glossary entries, and website pages, and defined a lightweight way for any HTMX page to refer to data from these lists. This made it easy for any page on a site to refer to other pages in a consistent fashion, and reduced the chance that an editing mistake would screw up a whole page.
The third big step I took was using traditional Unix tools to automate site building and update. Using make (created for Unix by Multician Stu Feldman) to invoke expandfile only when an HTMX file was newer than its corresponding HTML file meant that I could make a one-line change to a file and then just type make install to recompile the minimum number of files and automatically rsync them to the deployment site.
Other Applications of Expandfile
As expandfile developed, I found other uses for it, including reformatting database files and preparing input for other programs such as input to procmail, RSS feed declarations in XML, shell scripts, input to the dot graphical compiler, and XML sitemap files for the Google crawler.
For some applications, I expand a template which generates HTMX files which are in turn expanded; this lets me do "two pass" expansions so I can add up counters and then display them above the detailed information.
I wrote a web statistics application that uses expandfile to format complex daily reports of web server usage data loaded into SQL.
For a document formatting application, I extracted data from data files in a proprietary format, translated it to SQL and loaded it into a local database, used expandfile's *sqlloop builtin to generate HTML, and a browser to generate visually formatted output, printed the browser output to PostScript, and used page impression tools to generate a booklet.
I have also built template files that use the *shell builtin to fire off curl commands that fetch XML data from Web APIs, and then parse the result with *xmlloop to generate HTML reports.
I originally wrote expandfile in Perl 5, and used it on Unix, macOS and Windows, through years of evolution of my program, the Perl language, and the features provided on different platforms. I showed expandfile to friends, but they were put off by the difficulty of installing and configuring the Perl implementation.
- Installing the Perl libraries needed to use expandfile on a fresh machine may take hours of downloading, installing, and configuring.
- Different revision levels are provided on different platorms: e.g. macOS provided a very ancient version of Perl, so one had to install a later version of Perl.
- Dependency on CPAN library modules: for example, expandfile uses Perl module DBD::mysql, which must be available even if a particular HTMX program does not use SQL. I have tried to minimize the number of such modules.
- Installing CPAN modules sometimes requires that other binary libraries be installed on the computer and configured first. For example, installing DBD::mysql fails unless MySQL is installed and configured.
Early versions of expandfile had some features that I later decided were mistakes. Fortunately, nobody but me was using the program, and I knew where all the expandfile source files were. So I backed up the program and sources, created and tested a new program version, modified every source file that had to change, recompiled everything, compared old output to new, and accounted for changes before switching over to the new version.
Some of the changes were bug fixes or new features, such as *xmlloop and *format. A few changes were made when Perl syntax changed and the program had to be updated.
I made the following changes in early 2021:
- Eliminate little used syntax that gave special meaning to two characters: instead, implement the functions as macros for the few places that need it.
- Eliminate unnecessary control argument -config.
- Rename implementation variables to prefix their names with _xf_ to avoid collisions with user variable names.
- Add configuration variable _xf_expand_multics to enable or disable Multics features; eliminate -mult control argument.
- Allow multiple args to *shell, *fwrite, *fappend, and *htmlescape -- concatenate them with no separator.
- Add the *bindcsv function to replace a potentially exploitable practice.
- Make error messages more specific and add runtime checks for installation and implementation errors.
- Reimplement and test the configuration and install mechanisms, which worked poorly.
- Update documentation.
- Place all source on GitHub with MIT Open Source license.