25 Mar 2024

Internals of expandfile

This note describes how expandfile is implemented and how its parts can be used.

Application

The expandfile application is written in the Perl language. It is about 75 lines of Perl code. (Perl is provided with Mac and Unix systems and is available for Windows.)

expandfile uses a Perl module expandfile.pm that does most of the work. It is about 1500 lines. expandfile.pm exports three functions:

expandfile.pm uses a Perl module readbindsql.pm that it uses for processing loops over MySQL queries. It uses CPAN modules DBI and DBD::mysql.
expandfile.pm uses a Perl module readbindxml.pm that it uses for processing loops over XML files. It uses CPAN module XML::LibXML
expandfile.pm also uses CPAN modules LWP::Simple and Term::AnsiColor. These modules are usually installed with Perl.

expandfile is open source, MIT license, available on GitHub.

Using expandfile for File Transformation

I primarily use the expandfile application to generate static HTML pages from HTMX templates, as described in Multics Website Features. At times I also use the expandfile application to expand other kinds of templates, for example to convert data to new formats, or to automate conversion and report generation processes by invoking the *shell builtin to invoke command lines.

Using &expandstring() for Other HTML Production

I have written a few CGI programs, executed by a web server, that invoke expandfile::&expandstring() to generate dynamic HTML pages or data conversions, for example, mail sending and signup forms.

I have also written Perl programs that invoke expandfile::&expandstring() to generate static picture gallery HTML pages.

How expandfile Works

Processing

Known Bugs

Because I wrote the program, it works fine for me. There are a few areas where doing something not allowed by expandfile produces a poor or unhelpful error message, but I avoid these mistakes. One error I noticed is a very unhelpful message if you have an unbalanced quote in a COMMENT. (This happens because the quote processing happens before builtin functions (such as comment) are executed.) I make this mistake very rarely, and I don't see an easy fix without a lot of work, and changing the language.

Testing Expandfile

Test Suite

I built a set of smoke tests that I use after changes to expandfile.pm: about 70% of the language features are exercised. The tests are driven by a shell script that runs the tests and compares actual output to expected output. There is also the beginning of a test for the macros in htmxlib.htmi, which tries many of the possible cases for image-generation macros. Much more work would be required to turn this set of tests into a thorough test library. These tests are in a directory testexpandfile.

Test Suite Installation and Configuration

Install expandfile and its prerequisites as described in the installation instructions. Install the test suite. Then type the terminal command

  cd bin/testexpandfile; sh setup-config.sh

to set up bin/textexpandfile/config.htmi. If the file is already there, setup-config.sh just shows its date modified. If $HOME/.my.cnf is available, setup-config.sh will use MySQL configuration values from it.

Running the tests

In the testexpandfile directory, type

  sh test.sh

to run 49 tests of builtin functions, including *csvloop, *sqlloop, *xmlloop, *ssvloop, and *dirloop. This is not comprehensive... when I find time I can add more test cases. The tests display what the builtin returned, and what it was expected to return.

To test a few external functions called by *shell, type

  sh extfunctest.sh

To test some error messages, type

  sh testerr.sh

Type

  expandfile macrotest.htmt

To test macros in htmxlib.htmi, such as *callv,getimgdiv,.... Some of these macros encapuslate the handling of High DPI images. These tests say !OK! if the macro returns what was expected.

Weaknesses of the Current Perl Implementation

Future Improvements

There are a number of possible improvements to expandfile that I am considering. To make these changes I would have to rewrite the application code that uses the features.

Implementation Language

Efficiency has not been a crucial issue with expandfile for my uses. If I were dealing with templates with many thousands of lines, or frequent expansion of online templates, it might be important. Perl is an interpreted language. Using *shell launches 2 processes for every invocation, so it is even slower. It would be possible to rewrite expandfile in some other language than Perl: a non-interpreted language would save the parsing and interpretation of every Perl statement, and perhaps support some optimizations. The advantage of Perl for me is that I find it easy to make incremental improvements to the program's function and to debug problems.

On the other hand, Perl has many internal optimizations, tested by time. Regexp caching and execution, for example.

Currently, installing expandfile requires command line skills on the Mac and Linux. You need to install the right versions of Perl, various libraries and helper programs, MySQL, and CPAN -- all in the right order. This process takes hours. Some people have decided not to use expandfile because the install process is so complicated, so I am looking for ways to simplify the process.

Language Features Required

To my way of thinking, this rules out C. I would have to basically reinvent Perl. We have Perl already.

Python

In 2016, I tried porting expandfile to Python 2.7, as an experiment; the code had more lines and execution was a little slower. Much of the size increase came from the need to check if a Python variable was defined; in Perl, an undefined value in an associative array is equal to an empty string, and expandfile code assumes this. Converting to Python did find a few latent bugs. I did not try to port XML or MySQL features. I later rewrote expandfile to use internal getter and setter functions.

Python has some of the same issues as Perl, including managing language level, OS integration, deployment on user machines, and library integration. I would have to find equivalents for all the functions I now get from CPAN. In the early 00s, Macs came with both Perl and Python, but Apple is removing this support.

In a superficial examination of the Python MySQL connector interface, I did not see the feature of returning column names including table name for a JOIN. This would be a functional regression from the Perl version. I'd hanve to rewrite some of my code that depends on this feature.

I have not considered Python 3. It is up to 3.12 already. Expandfile has changed a lot since 2016, and porting to Python 3 would be a whole new project.

Go

In Oct 2021, I began porting expandfile to the Go> language, version 1.7.2, on a Mac. As of Jan 2022, it works, except for the XML library. All tests complete without crashing: most tests pass, except for minor cosmetic problems. The Go implementation is slower than the Perl version (looking into this). My observations are in a separate page.

Ruby

In Jul 2022, I tried porting expandfile to the Ruby> language, version 3.1.2, on a Mac. Regular expressions in Ruby are slightly different from those in Perl: it is not clear whether existing expandfile programs that depend on regular expressions can be compatible.

All tests complete without crashing as of 27 Aug 2022. Basic expansion and builtin tests pass. CSV parsing works. SQL access works, but the Ruby "mysql2" module does not provide table names in reflection data, so table names are not used in variable bindings of query results. A few tests of Multics expansions are failing. Some *callv macros return an extraneous NL character at the end of their result. XML access is not finished yet. I am looking at using rexml and its XPATH support. My tests compile and run but get wrong answers.


Rust

Next I may try porting expandfile to the Rust> language. I hope the result will be faster than Go and Perl. I have started preliminary explorations.

JavaScript

How about re-implementing expandfile in JavaScript? It looks possible but I have not started on it.

Perl 6 (Raku)

Perl 6 is still in its early development: it was first available in 2017. I considered re-implementing expandfile in Perl 6, tried a few experiments, gave up, at least for now.

Costs: learn a new language, major recoding of everything. Because the language is still developing, it might change under me: Perl did sometimes. Harder to involve other programmers (has not been an issue yet).

Seemed like the best way to get Perl6 on a Mac is to visit https://rakudo.org/downloads/star and download rakudo-star-whatever-macos-x86_64.dmg. I did.

Early experiments: I made Perl v6 versions of expandfile and expandfile.pm and hammered on them till they compiled in Raku. (I commented out parts that depended on CPAN, such as *sqlloop.) Documentation and compiler diagnostic messages after minor errors were often very hard to understand. (For example, unbalanced constructs produce a cryptic error message complaining about the last line of the program. I have not seen such unhelpful messages since using AED-0 in 1965.) I gave up.

LISP

Just a thought. Bernie went through similar implementation choice considerations when designing Emacs in 1978, and chose LISP. It's fast, widely available, and has a compiler with integration between interpretation and compilation. It can call externally compiled routines in other languages. Lotta different kinds of LISP: clisp, clojure, gcl, ecl, scheme.. which would I use? LISP is available for Mac, Windows, Linux, but not standard on any.

C++

How bout rewriting the Perl in C++ using std::string and <cctype> and boost_regex? Would have to learn a lot of fine points: ownership, views, const. My experience with C++ at Taligent was that it's a powerful and efficient language with many pitfalls. Library support for SQL and XML would probably require me to learn a lot about Makefile generation and packaging.

Other

Need a language that is freely available and easy to install.

Other Improvement Ideas

  
  # the filenames in expandfile are relative to the wdir, not to the calling file path.
  # maybe this should be changed. certainly it should be documented.
  #
  # 01/11/05 6 AM ideas
  # 2) really I want %[*if,(predicate),cmd]%  and access to a full language.. LISP
  # 3) notion of arrays -- SSVs are a weak substitute
  # 4) variables as push down lists or stacks
  #
  # there are limits to sizes. document.
  #
  # here is another idea.. %[*pipe,&outputvar,inputvar,command]%
  # .. say I build up a variable that has a lot of mysql input, and feed it to mysql shell command.
  # .. doing this in Perl with both input and output is tricky
  # .. right now I can do this by generating the var, writing into a temp file, and running the cmd .. litters wdir, has restart and nesting issues
  # .. Perl6 Proc.async could do this
  #
  # my address book reader does complicated parsing and could be a third loop if I can 
  # .. figure out a callback scheme.  That is really the wanted breakthrough.  I guess I could
  # .. do something like a Perl "exec" %[*exec,perl]% -- is this too powerful??
  #