This note describes the history of expandfile, a simple Unix command line program for expanding templates.
History
I wrote expandfile in 2002 to replace a collection of ad hoc Perl programs that I had been using to maintain the multicians.org website since 1995. Since 2002, I've added features to expandfile occasionally.
The idea of a simple macro expander dates back to Christopher Strachey's GPM macro expander,
which I had experimented with in the mid 60s on CTSS at MIT Project MAC.
Later, I used several computer languages that used pre-processors to add features to their languages, such as ALM, PL/I, and C:
they provided features such as including other input files.
Because multicians.org had mirror sites where I could not modify the web server configuration or execute code on the server, I could not dynamically generate web pages. Furthermore, at the time, people implementing web sites by generating pages dynamically from databases were encountering web server performance and security problems. I chose to create static HTML pages and have the web server serve them without run-time calculation. (I had read a little about PHP, but at that time it was considered primarily a browser enhancement for FORM processing.)
My ISP's web servers supported a "server side include" feature, which let users insert the contents of auxiliary files when serving a web page. When I tried to separate my pages into content and boilerplate, I found that I wanted to have the boilerplate be slightly different for each page, so I sought a method where
- Each "object" HTML web page was created from a "source" file
- Source files were stored in the file system on my computer
- Object files were stored in the file system on server computers and were HTML ready to serve
- Object files were also stored in the file system on my computer and copied to the server computers when changed
- If the same fact was presented in many object files, it could come from a single source file
RUNOFF is an archetype text-transformation language system, and I had used it as well in CTSS days.
RUNOFF's input is either text to be copied to the output,
or commands that change the state of execution and affect how later processing works.
Later implementations of RUNOFF-like languages in Multics and elsewhere included the ideas of macro execution and multi-pass processing.
I decided to write my own source text expansion tool that did not parse the underlying input language, similar to GPM: it would just transform text strings into other text strings, with a minimal way of defining macros. This made the program more general and freed it from dependency on the syntax of the underlying language; I didn't have to write or maintain an HTML parser, and changes in the HTML spec would rarely require the tool to change. As in GPM and RUNOFF, I could set and evaluate string variables, and expand macros that accepted string variables as arguments.
I looked at the Unix m4 tool, written by friends of mine from Multics days. It wasn't available for the computer I had then; Perl was working OK.
I studied the errors I made often when maintaining multicians.org, and tried to build tools that would prevent them. Having each fact in one place was good only if I made sure to regenerate all object pages that used that fact: this led me to use the make program. Another common error was forgetting to update server files when I modified a file on my computer: this suggested the use of rsync.
Using Expandfile to Generate Web Sites
I first used expandfile to translate "HTML with extensions" input (which I called HTMX) into HTML, mostly to include common boilerplate, such as page banners and CSS layout instructions used on all my web pages. I added features to allow variables in the page header and footer data, like "title" and "date updated." Adding builtin functions that could transform variables' values came next, then the ability to capture the output of external shell commands, and then integration with SQL. I used these steps to simplify my work flow for maintaining websites I created, and to eliminate special-purpose Perl programs in favor of logic in HTMX web page templates.
The big advance for me was introducing the *block builtin, and the pattern of writing HTMX files that
- Set parameter variables, define a body block that includes variable expansions, and possibly define other blocks.
- At the end, *include a wrapper file that outputs headers, *expands the body block, and then outputs footers. (The header and footer can expand the parameter variables to set page titles and so on.)
This pattern separates site boilerplate from page content, provides an independent source file for each HTML page, and makes it easy to regenerate a single page.
Using expandfile was valuable when I made global changes to every page on a site to change each page's appearance, to conform to changing HTML specifications, or to use new browser features.
Connecting expandfile to my local MySQL database and supporting *sqlloop was the next big step. This provided consistent formatting for lists of people, publications, glossary entries, and website page indexes, and defined a lightweight way for any HTMX file to refer to data from these lists. These changes reduced the chance that an editing mistake would screw up a whole page.
The third big step I took was using traditional Unix tools to automate site building and publishing. Using make (created for Unix by Multician Stu Feldman) to invoke expandfile only when an HTMX file was newer than its corresponding HTML files meant that I could make a one-line change to a file and then just type make install to recompile the minimum number of files and automatically rsync them to the deployment site.
Other Applications of Expandfile
As expandfile developed, I found other uses for it, including reformatting database files and preparing input for other programs such as input to procmail, RSS feed declarations in XML, shell scripts, input to the dot graphical compiler, and XML sitemap files for the Google crawler.
For some applications, I expand a template which generates HTMX files which are in turn expanded; this lets me do "two pass" expansions so I can add up counters and then display them above the detailed information.
In 2004, I wrote a web statistics application that uses expandfile to format complex daily reports of web server usage data loaded into SQL.
For a document formatting application, I extracted data from data files in a proprietary format, translated it to SQL and loaded it into a local MySQL database, used expandfile's *sqlloop builtin to generate HTML, and a browser to generate visually formatted output, printed the browser output to PostScript, and used page impression tools to generate a booklet.
I have also built template files that use the *shell builtin to fire off curl commands that fetch XML data from Web APIs, and then parse the result with *xmlloop to generate HTML reports.
Language Issues
I originally wrote expandfile in Perl 5, and used it on Unix, macOS and Windows, through years of evolution of my program, the Perl language, and the features provided on different platforms. I showed expandfile to friends, but they were put off by the difficulty of installing and configuring the Perl implementation:
- Installing the Perl libraries needed to use expandfile on a fresh machine may take hours of downloading, installing, and configuring.
- Different revision levels are provided on different platorms: e.g. macOS no longer provides Perl, so users have to install it.
- Dependencies on CPAN library modules requires extra installation steps. for example, expandfile uses Perl module DBD::mysql, which must be available even if a particular HTMX program does not use SQL. Similarly, expandfile requires XML::LibXML even if you don't use *xmlloop.
- Installing CPAN modules sometimes requires that other binary libraries be installed on the computer and configured first. For example, installing DBD::mysql fails unless MySQL is installed and configured.
Later Improvements
Early versions of expandfile had some features that I later decided were mistakes. Fortunately, nobody but me was using the program, and I knew where all the HTMX source files were. So I backed up the program and sources, created and tested a new program version, modified every source file that had to change, recompiled everything, compared old output to new, and accounted for changes before switching over to the new version.
Some of the changes were bug fixes or new features, such as *xmlloop and *format. A few changes were made when Perl syntax changed and the program had to be updated.
I made the following changes in early 2021:
- Eliminated little used syntax that gave special meaning to two characters: instead, implemented the functions as macros for the few places that need it.
- Eliminated unnecessary control argument -config; configuration files are just expanded for side effect before other input files.
- Renamed implementation variables to prefix their names with _xf_ to avoid collisions with user variable names.
- Added configuration variable _xf_expand_multics to enable or disable Multics features; eliminated -mult control argument.
- Allowed multiple args to *shell, *fwrite, *fappend, and *htmlescape -- concatenate them with no separator.
- Added the *bindcsv function to replace a potentially exploitable practice.
- Made error messages more specific and added runtime checks for installation and implementation errors.
- Reimplemented, tested, and documented the configuration and install mechanisms.
- Created a comprehensive test suite.
- Updated documentation and added a Unix man page.
- Placed all source on GitHub with MIT Open Source license.
Comparison to other approaches
PHP
PHP programs are expanded at runtime into HTML on every view. PHP constructs look like <?php echo '<p>Hello World</p>'; ?> You can set and refer to variables that have string values. PHP has over 1000 builtin functions, including SQL access. (You can install a caching module into your server's web server that avoids unnecessary expansions.)
Hugo
Just leanred a little about this recently. Hugo is based on go instead of Perl; it uses Go's conventions about modules and source organization. Similar to expandfile, Hugo input consists of HTML and extension constructs -- Hugo uses {{ ... }} instead of %[ ... ]% . It has variables, function expansion, and function definition. Looks slick. It uses a kind of "markdown" to avoid writing HTML. It supports multi-language translation, themes, and static websites.