Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Why ASDF is confusing

At some point on everyone's Common Lisp journey, they're going to reach a point where they want to create a program composed of more than one file. They might even want to create a library! Unfortunately this can quickly become a head-banging exercise in frustration and confusion.

At the root of any confusion I think is a lack of familiarity with the way Lisp gets code into memory for executing -- and for good reason, because hardly anything else does things similarly, at least at the level of exposure that Lisp requires. If I'm right then working through the details and comparing with other languages should help with making ASDF (or the choice of not using it) less confusing. I'm even repeating some very basic things that are probably understood by any potential readers already.

The key difference is LOAD. I don't want to get into the subtleties of loading a source file vs. compiled file and the behavior of eval-when, though those are important for further understanding/avoiding other headaches, here I'm just bringing it up because LOAD is essentially the only way to bring in new code. From the description, it "sequentially executes each form it encounters". Intuitively you can picture this as sequentially EVAL'ing each form, so EVAL under the hood is doing the work, but LOAD is your user interface. Every sort of "package management" or "library management" or "code module management" system is just built on managing LOADs.

In the REPL, when you define a function with defun, your input is eval'd and you see the function name printed as the return value, but the function body itself won't execute until you call it. In other words evaluating the defun only means your function is now in memory and ready to be called. Similarly if you (load "file.lisp" :print t) and your file only contains defuns, you'll see printed in comments the function names as each defun gets evaluated one after another. Then, you'll be able to call those functions.

If you add a (defpackage :blah (:use :cl)) to the top of your file, and load again, you'll see the package printed, and you can still call your functions.

If you add an (in-package :blah) after the defpackage, and load again from a fresh Lisp REPL, you won't be able to call your functions. Because LOAD binds *package*, the file being loaded can change it with in-package but not affect the outer loading context. But your function is still in memory, it's just only accessible through the blah namespace, so you can call it with (blah::foo).

You can even load another file that just consists of (blah::foo), and it will be executed. As long as a function is in memory, you can call it from anywhere. This is analogous to Java's reflection API, which lets you call a method on a class so long as that class exists (or can be found and loaded) in memory, no need to have them in the imports section. Java keeps the reflection API tucked away, however, whereas in Lisp LOAD is what you get upfront.

For some programmers, knowing how LOAD works at this level of detail is good enough for them. They write files "a.lisp", "b.lisp", "c.lisp", and in a "main.lisp" they make sure at the top to explicitly LOAD each file. They might even use quicklisp, and ql:quickload any dependencies as if they were big local files. They tell others to load "main.lisp" and then run "main". (Or they might create an executable that does those two steps for you, or they might have you run sbcl --script.)

There's no one who can say this way of working is wrong, Lisp doesn't care, if it works for you then go for it. When you want to play with others you'll likely need to change things, but perhaps less than you might think.

This process is somewhat analogous to how a C program is built, except in Lisp you have to manage the linking order yourself. In C, you might have a/b/c .h and .c files, and a main.c file that includes all the h's (which acts as copy-pasting the contents of the header files into the C file at the include point). You compile a/b/c/main into .o files, and the linker combines all their data (like function names and their bodies) into a global namespace and packs them into an executable. When you run the executable, the OS puts all its data and functions into memory, and then starts executing at the main entry point.

It's also somewhat analogous to how a Python program works. You might have a/b/c .py files, and a file that imports the three (which executes them line by line, so long as they haven't already been imported) before defining its main. Like Lisp, you tell others to import main and run main.main(), or simply execute 'python', or perhaps build an exe that does that for them.

On the surface, if you only had a project with such a simple structure, is there really any confusion, or that much relevant difference? But things start getting complicated when you start to have cross-file dependencies. Suddenly b is no longer standalone, but wants to use a function (of many) from a and a function (of many) from c.

Because Lisp is dynamic, it's ok if you load b first, so long as you don't actually try to call any of those other functions yet. As long as they exist by the time you do, things will work out. (Though SBCL will yell at you if you compile b and haven't yet compiled a and c, because it can't do certain optimizations, and because it's a useful typo check.)

But because Lisp has macros, which can be thought of as functions that operate on source code at compile time, sometimes you really do need correct load order, or things won't work out. Generally having correct load order is a good thing, even if not every circumstance requires it. What can you do to ensure it?

In C at the language level, it's done with "header guards". A struct is defined in a.h that you want to use in a struct defined in b.h? That's fine, just include a.h in b.h's header, before you define b.h's struct. a.h has a guard so that it only is included once. So, when it's time to compile main.c, it includes a.h, then b.h, but b.h's include of a.h will not end up doing anything thanks to the guard. Above the language level, you have Makefiles.

In Python, it's done in morally the same fashion. If you have a that imports a, and then imports b, which also imports a, a doesn't get executed again, the only effect is that a's namespace now becomes usable by b.

But in Lisp, if you load a.lisp, and then load b.lisp, which also loads a.lisp, a.lisp is executed again. If you have them include each other, you get an infinite loop, or in SBCL it detects a problem and opens the debugger.

In exchange for this lack of protection, you now have full hot-reloading as a feature. But to fully make use of that feature, you'll want to make sure changes are propagated. If you redefine a function, callers should call into the new definition, not the old one. This, at least, is taken care of for you. But if you redefine a macro, because it happens at compile time, code that uses the macro will need to be recompiled, and this isn't done automatically just by loading the file defining the macro.

So we're still stuck on the problem of cross-dependencies. One approach is just to get rid of all your LOADs in all your Lisp files, make a new loader.lisp file that just loads everything in the right order. (b.lisp needs functions or macros from a.lisp and c.lisp, so load a.lisp and c.lisp first, then load b.lisp.) At program start, load loader.lisp, and then you're good to go. If you change a macro somewhere, rather than trying to remember which files depended on that macro and need to be recompiled and reloaded, just reload loader.lisp which has the full order anyway. When you're editing your files, you can be confident that the file can "see" everything that comes before it in the loader, without having to explicitly reference declarations of such things.

This is very analogous to Forth. If you've ever read Thinking Forth, there's a code listing 5.1 on pdf-page 157:

\ QTF+ Load Screen
." 2.01" ;
9 LOAD \ compiler tools, language primitives
12 LOAD \ video primitives
21 LOAD \ editor
39 LOAD \ line display
48 LOAD \ formatter
69 LOAD \ boxes
81 LOAD \ deferring
90 LOAD \ framing
96 LOAD \ labels, figures, tables
102 LOAD \ table of contents generator

In this formulation, the application is structured like a "book" and each "chapter" is the code-screen being loaded. (Screen 12 is the load screen for the video primitives chapter.) Each "chapter" might itself contain more loads. In modern times, you can easily get a hierarchy like this by making use of third party libraries. Your loader calls the loaders of the libraries you use (like a line display library), then your own files. Those individual libraries themselves have loaders that do the same thing.

This lack of needing explicit declaration references is really weird if you come from another language and I think at the heart of confusions. Files are not isolated containers in Lisp like they are for other languages, they're context sensitive bags of code. This is a big reason why so many projects are organized around loading some "package.lisp" or "packages.lisp" first, which define namespaces and symbol imports/exports, and then every other file just starts with an in-namespace at the top to set the context.

But the "loader" approach works. In exchange for not having to declare (and re-re-re-re-declare...) your dependencies in every file, you just have to manage a global total ordering yourself! Whether this tradeoff is worth it is still debatable. Certainly though in modern times the other non-Lisp languages and ecosystems have become much more convenient about their choice. In my six years at my last BigCo Java job, I basically never typed an explicit 'import', and just let Eclipse insert those for me. When I was doing Clojure, the existence of Slamhound was what made me keep with it. It's unfortunate that there doesn't seem to be anything equivalent in Lisp, even from the paid IDEs, but at the same time such a thing isn't needed as much, because once something is loaded in memory you can just use it, and smart symbol completion and APROPOS and more do exist to help you find and use things. (Not to get too off-topic, but I would like to see more tooling support for exporting and importing symbols.)

The "loader" approach would work even better if it was standardized, we could all play together and have a library ecosystem. So at last we have the concept of a "system". A system at its most basic is just a named list of files and their load order. ASDF lets you specify that order as "serial", which will be the order they're defined, and now you can execute (asdf:load-system :your-system) to kick things off instead of (load "loader.lisp"). The end results of what ends up in your computer memory are the same.

Once you give names to systems, you can depend on the names, and ASDF can ensure all of the needed files in a named system you depend on are loaded before any of your code. There are some confusions in that ASDF needs to be able to find the systems names -- if you're using quicklisp and it's on there, then this is easy, but if you're still developing your libs/modules/sub-systems locally, ASDF not pushing "./" to its default central registry trips a lot of people up.

Admittedly it's annoying that every time you add a new file to your project, you need to also add it to your ASDF system definition (or your loader.lisp) in the right order, it'd be nice to have more tooling for this. And it's annoying that any time you make a wide-reaching change and want to make sure it's fully propagated to all dependents, you seemingly end up reloading everything. But another advantage of ASDF over a custom loader is that ASDF has ways to mitigate these things that any project can opt-in to.

The first can be helped if you adhere to the files-as-isolated-units-with-dependencies-declared convention that other languages require. When you follow that convention a certain way, ASDF can infer your files and their load order.

The second can be helped by methods (including but not limited to inferred systems) to add more granularly in your dependency definitions than a flat serial load order, so you can load (or reload) smaller bits, or when you reload only the parts that have actually changed or need changing need to be reloaded, and this can extend to changes to your system definition itself.

And though it may be better practice to do so, you don't necessarily need to re-declare a dependency if it's a transitive dependency of something you already depend on, since again once something is loaded it's in memory and everything can use it without needing some extra permission/declaration to the compiler.

ASDF can do even more, but the basics are pretty simple, once you get used to the weird inversion of files not being required to declare their dependencies. That weirdness seems to be the meat of what people find confusing, anyway.

Like many archaic things about Lisp, I think it's worth giving it an honest go before jumping straight for some "modern" take that "fixes" some perceived problem. Sometimes understanding the usefulness of the old way also requires better understanding of the tools already available. This load behavior could be quite frustrating if the package a file puts itself in (note it doesn't have to restrict itself to one package!) :use's a lot of symbols from all over, like you're reading code and where do these functions come from? It can't be answered statically from the file on its own (though it could from a whole project view an IDE or even ctags might have). But swank+slime (whether in emacs or vim or vs code or..) supports jump-to-source and other things like symbol describe and symbol completion and cross-referencing, these let you dynamically find out where things came from and so on. Knowing about such tools is I think essential for the Lisp beginner who has advanced to having projects spanning more than one file, especially if they're aware of what modern static languages and their environments offer.

Posted on 2021-12-10 by Jach

Tags: c, forth, java, lisp, programming, python


Trackback URL:

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.