18 Sep 2020

Internals of expandfile

This note describes how expandfile is implemented and how its parts can be used.

Application

The expandfile application is written in the Perl language. It is about 75 lines of Perl code. (Perl is provided with Mac and Unix systems and is available for Windows.)

expandfile uses a Perl module expandfile.pm that does most of the work. It is about 1500 lines. expandfile.pm exports three functions:

expandfile.pm uses a Perl module readbindsql.pm that it uses for processing loops over MySQL queries. It uses DBI and DBD::mysql.
expandfile.pm uses a Perl module readbindxml.pm that it uses for processing loops over XML files. It uses XML::LibXML
expandfile.pm also uses Perl modules LWP::Simple and Term::AnsiColor. These modules are usually installed with Perl.

expandfile is open source, MIT license, available on GitHub.

Using expandfile for File Transformation

I primarily use the expandfile application to generate static HTML pages from HTMX templates, as described in Multics Website Features. At times I also use the expandfile application to expand other kinds of templates, for example to convert data to new formats, or to automate conversion and report generation processes by invoking the *shell builtin to invoke command lines.

Using &expandstring() for Other HTML Production

I have written a few CGI programs, executed by a web server, that invoke expandfile::&expandstring() to generate dynamic HTML pages or data conversions, for example, mail sending and signup forms.

I have also written Perl programs that invoke expandfile::&expandstring() to generate static picture gallery HTML pages.

How expandfile Works

Processing

Known Bugs

Because I wrote the program, it works fine for me. There are a few areas where doing something not allowed by expandfile produces a poor or unhelpful error message, but I avoid these mistakes. One error I noticed is a very unhelpful message if you have an unbalanced quote in a COMMENT. (This happens because the quote processing happens before builtin functions (such as comment) are executed.) I make this mistake very rarely, and I don't see an easy fix without a lot of work, and changing the language.

Testing Expandfile

Test Suite

I built a set of smoke tests that I use after changes to expandfile.pm: about 70% of the language features are exercised. The tests are driven by a shell script that runs the tests and compares actual output to expected output. There is also the beginning of a test for the macros in htmxlib.htmi, which tries many of the possible cases for image-generation macros. Much more work would be required to turn this set of tests into a thorough test library. These tests are in a directory bin/testexpandfile.

Test Suite Installation and Configuration

Install expandfile and its prerequisites as described in the installation instructions. Install the test suite. Then type the terminal command

  cd bin/testexpandfile; sh setup-config.sh

to set up bin/textexpandfile/config.htmi. If the file is already there, setup-config.sh just shows its date modified. If .my.cnf is available, setup-config.sh will use MySQL configuration values from it.

Running the tests

In the testexpandfile directory, type

  sh test.sh

to run 49 tests of builtin functions, including *csvloop, *sqlloop, *xmlloop, *ssvloop, and *dirloop. This is not comprehensive... when I find time I can add more test cases. The tests display what the builtin returned, and what it was expected to return.

To test a few external functions called by *shell, type

  sh extfunctest.sh

To test some error messages, type

  sh testerr.sh

Type

  expandfile macrotest.htmt

To test macros in htmxlib.htmi, such as *callv,getimgdiv,.... Some of these macros encapuslate the handling of high-dpi images. These tests say !OK! if the macro returns what was expected.

Future Improvements

There are a number of possible improvements to expandfile that I am considering.

Implementation Language

Efficiency has not been a crucial issue with expandfile for my uses. If I were dealing with templates with many thousands of lines, or frequent expansion of online templates, it might be important. Perl is an interpreted language, so it is not fast. Using *shell launches 2 processes for every invocation, so it is even slower. It would be possible to rewrite expandfile in some other language than Perl: a non-interpreted language would save the parsing and interpretation of every Perl statement, and perhaps support some optimizations. The advantage of Perl for me is that I find it easy to make incremental improvements to the program's function and to debug problems.

Currently, installing expandfile requires command line skills on the Mac and Linux. You need to install the right versions of Perl, various libraries and helper programs, MySQL, and CPAN -- all in the right order. Some people have decided not to use expandfile because the install process is so complicated, so I am looking for ways to simplify the process.

Python

(I ported expandfile to Python 2.7, as an experiment; the code had more lines and execution was a little slower. Much of the size increase came from the need to check if a Python variable was defined; in Perl, an undefined value in an associative array is equal to an empty string, and expandfile code assumes this. Converting to Python did find a few latent bugs.)

Python has some of the same issues as Perl, including managing language level, OS integration, deployment on user machines, and library integration. I would have to find equivalents for the functions I now get from CPAN. In the early 00s, Macs came with both Perl and Python, but Apple is removing this support.

JavaScript

How about re-implementing expandfile in JavaScript?

Perl 6 (Raku)

Perl 6 is still in its early development: it was first available in 2017. Re-implementing expandfile in Perl 6 is worth considering. Syntax and semantics are different. Could fix some of the packaging problems by testing the result of use -- I think. Could generalize processing -- I think. Could clean up and make the input language more regular, as mentioned below. (Nobody uses expandfile but me, and I can find and modify every program, if I have to.) Basically I would have to do a lot of experiments. See https://raku.org/downloads.

Costs: learn a new language. Because the language is still developing, it might change under me: Perl did sometimes. Harder to involve other programmers (has not been an issue yet).

Early experiments: I made Perl v6 versions of expandfile and expandfile.pm and hammered on them till they compiled. (I commented out parts that depended on CPAN, such as *sqlloop.) Documentation and compiler diagnostic messages were often hard to understand. (For example, unbalanced constructs produce a cryptic error message complaining about the last line of the program. I have not seen such unhelpful messages since using AED-0 in 1965.)

Seems like the best way to get Perl6 on a Mac is to visit https://rakudo.org/downloads/star and download rakudo-star-whatever-macos-x86_64.dmg.

LISP

Bernie went through similar implementation choice considerations when designing Emacs in 1978, and chose LISP. It's widely available, and has a compiler with integration between interpretation and compilation. It can call externally compiled routines in other languages. Just a thought. Lotta different kinds of LISP: clisp, clojure, gcl, ecl, scheme.. which would I use? LISP is available for Mac, Windows, Linux, but not standard on any.

Other Improvement Ideas

  
  # the filenames in expandfile are relative to the wdir, not to the calling file path.
  # maybe this should be changed. certainly it should be documented.
  #
  # 01/11/05 6 AM ideas
  # 2) really I want %[*if,(predicate),cmd]%  and access to a full language.. LISP
  # 3) notion of arrays
  # 4) variables as push down lists or stacks
  #
  # there are limits to sizes. document.
  #
  # here is another idea.. %[*pipe,&outputvar,inputvar,command]%
  # .. say I build up a variable that has a lot of mysql input, and feed it to mysql shell command.
  # .. doing this in Perl with both input and output is tricky
  # .. right now I can do this by generating the var, writing into a temp file, and running the cmd .. litters wdir, has restart and nesting issues
  # .. Perl6 Proc.async could do this
  #
  # my address book reader does complicated parsing and could be a third loop if I can 
  # .. figure out a callback scheme.  That is really the wanted breakthrough.  I guess I could
  # .. do something like a Perl "exec" %[*exec,perl]% -- is this too powerful??
  #