12 Apr 2022

expandfile in Go

These are my informal notes as I convert the Perl program expandfile to the Go language.

Go was started at Google in 2007 by Ken Thompson, Robert Griesemer, and Rob Pike. The language was released in 2009. The language was designed for large code projects and concurrent processing. It is easy to learn and use. It has static types and garbage collects memory.

(Oct 2021) I began porting expandfile to Go 17.3.

My Project

I began writing expandfile in 2002. See expandfile-history.html.

Expandfile reads text files, expands macros, and writes a new text file. I use it to create HTML pages for web sites, as well as other text transformation tasks.

Perl Version

I wrote expandfile in Perl. (I have used Perl since about 1996.)

The main expandfile application is about 75 lines of Perl. The application uses a library of about 1500 lines of Perl, plus other smaller libraries for SQL and XML processing. expandfile uses about a dozen Perl library modules from CPAN.

In some senses, I wrote expandfile in a style of "PL/I written in Perl." One can write very terse Perl statements that do a lot. I chose to write less dense code that I would be able to understand later. This made it easier to translate to other languages, such as Python.

expandfile is open source and available without fee from GitHub. Documentation for the program is online.

Go version

In Go the main expandfile application is 160 lines of Go. The application uses a library of about 2523 lines of Perl, including SQL processing but not XML. The Go version uses about a dozen Go library modules including SQL.

I tried to make my Go code clear and modular. Initially I tried to translate one line of Perl to one or more lines of Go. See below for some lessons.

I have not yet put the Go version into Github.

Status

(12 Dec 2021) The basic expandfile functions work. testexpandfile runs and most tests pass:

Current Known bugs (15 Jan 2022)

Work to be done (19 Jan 2022)

Performance

(17 dec 2021) Expanding mx-net.htmx (2126 lines) took 0.889 seconds with Perl and 24.170 seconds with Go -- a factor of 27x slower. This was unacceptable. Recompiling the whole Multics site (478 files) would take over 3 hours, instead of about 6 minutes.

I analyzed what the Perl expandfile was doing. Basically it made 14 passes over the file, doing one transform at a time: block binding, variable and builtin expansion, Multics lookups (4 types), Multics formatting (4 types done twice). Each pass replaced the whole copy of the file with a changed one. Perl is very efficient about this; Go is less so. (Perhaps the Go version of the program invokes the garbage collector a lot, or it is not as fast as Perl's?) Further investigation with the profiler will help understand this.

I did some experiments. I found that most of the slowdown came from the "Multics" source constructs that replace a string like "edited by {[VanVleck Tom Van Vleck [THVV]]}" with "edited by Tom Van Vleck [THVV]". This construct looks up "VanVleck" in my local MySQL database.

I investigated whether MySQL access was slower in Go than in Perl by instrumenting the lookupSQL function. The 130 lookup calls in mx-net.htmx averaged a millisecond or two: not enough to explain over 20 seconds' delay.

I rewrote expandMulticsBody() and cleanRef() to perform all the formatting, lookup, and unwrapping operations in a single pass over their inputs, rather than repeated passes. Each program implemented a simple state machine inside a loop over the input characters. This reduced the number of passes over the input from 14 to 3. mx-net.htmx executed in 3.537 seconds, a factor of 7 improvement. Go is about 4x slower than Perl. Compiling the source file with the Multics constructs removed takes about 2 seconds. More speedup can be done.

My first Go version of expandfile used a simple set of functions to simulate Perl lists. I learned more about the language and rewrote these routines to use a more object-oriented struct that wrapped a container/list instance instead of a fixed array. It is still about 4x slower than Perl.

I tried using the Go profiling tools. Go said it generated a profile but displaying it got an error:

/var/folders/5c/1769jkh88xjcxc0059b_brt80000gp/T/profile2406022249/cpu.pprof: parsing profile: empty input file
failed to fetch any source profiles

... this is the problem with picking how-to articles off the Internet: sometimes they are out of date.

Lessons

Here are some of the lessons I learned while writing the Go version of expandfile.

Go Syntax

Go's basic syntax is similar to C's. The big differences from Perl are:

It is reasonably easy to start by editing Perl source into Go with repeated edit passes, then trying to compile and fixing errors from the compiler. Most Go compiler error messages are clear and tell you what to fix.

regex

Go Semantics

Types

Resources

Many online resources are available for learning Go.

Tools

The Go compiler and runtime are easy to download and install. On my Mac, I issued the command brew install golang and installation was painless.

Editing Go programs was tedious, until I installed golang-mode into Emacs. That made editing easy.

My Perl version has a test suite, testexpandfile, that exercises expandfile thoroughly. This was valuable for debugging the Go version. I just changed export EXPAND=expandfile to export EXPAND="go run expandfile.go" and ran the tests, fixed problems, ran it again, until it worked.

Source management and compiling

Compilation Environment setup

Numbers and conversions

Packages

Go requires programs to import specific packages from the Go library and from external repositories. epm.go imports about a dozen Go library modules:

To fetch packages from GitHub: go get -u github.com/antchfx/xmlquery

Functions

Executing shell commands

expandfile's *shell builtin executes a command. The package to import that does this is os/exec. I converted my Perl code to call execCmd := exec.Command(args[0], args[1:]...) and then run and capture its output.

The os/exec package locates the binary executable and calls it directly. That is, the command is not sent to the system shell to launch the target process. Perl's open($fh, "$cmd|") construct launches the command by calling the shell, which then invokes the command, as described in Section 16.3 of the Camel book.

I rewrote my external command builtin to invoke sh -c commandline with os/exec. This makes it work the way Perl does, and saves me from having to rewrite existing applications of expandfile. This has an efficiency penalty, because it launches a shell process and then the command, but *shell is not used often in my pages.

CSV access

These functions pass basic tests. The Go version can read a local CSV file and expand a template. The *bindcsv test in testexpandfile passes for both local files and remote URLs. The *csvloop test also passes.

MySQL access

(Tried to do this as a separate package, even though there is no current way to load this feature only if needed. Failed to compile with a circular dependency. Merged the package content back into epm.)

Because the query support does not return the number of rows, I have to count them by reading the rows with Next(). I had to count the rows as I processed them, and set the row count at the end of IterateSQL() instead of the beginning, so the iterator cannot use the value. I don't think this is an issue.

Three functions are defined: openSQL, lookupSQL, and iterateSQL.

Opening the MySQL database sets ColumnsWithAlias: true in order to include the table name in the column name.

https://pkg.go.dev/database/sql describes the sql interface for Go. http://go-database-sql.org/varcols.html describes how to deal with the query result using reflection.

Interesting: info on connection pooling and error handling. https://github.blog/2020-05-20-three-bugs-in-the-go-mysql-driver/

XML access

Started on this feature. Going to try to use https://github.com/antchfx/xpath which executes Xpath queries. There are difficulties with introspection.