TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Globals in Lisp

I've been reading posts from Claus Brod's blog lately after having discovered him while tracking down the claim on Creo Elements using Lisp. He's got some interesting thoughts! It's also fun to see someone who I'd bet is pretty close to a programming god (CAD stuff seems up there in the hardcore tier to me) ponder such low level and trivial details like globals in Lisp.

As I write more software in Lisp I can feel my opinions converging on the matter in terms of a preferred style, but as yet I am undecided. So let me instead just spitball some thoughts.

First, globals in Lisp aren't actually global in the same way they are in C, because we have namespaces/packages. This immediately lowers their level of "evilness" in my book. Similarly Lisp's form of goto is a labeled goto, rather than a "move the PC to any address and continue executing" goto, and thus is also less evil than archaic mantras suggest.

So because they aren't true globals in that sense, only package-globals, there's not much moral difference between

(defpackage #:foo (:use #:cl))

(in-package #:foo)
(defparameter *blah* 'thing)
(defparameter *other-blah* 'other-thing)

(defun bar ()
(cons *blah* *other-blah*))
...


and

(defpackage #:foo (:use #:cl))

(in-package #:foo)
(let ((blah 'thing)
(other-blah 'other-thing))

(defun bar ()
(cons blah other-blah))
)
...


As I said, they're mostly morally equivalent, I don't think one is obviously better than the other. They have their tradeoffs. Some tradeoffs are clear, others less so. The former style exposes both its vars as part of the implicit interface to the foo package, even if they are unexported, which may be undesirable. Do you want to further invite Hyrum's Law headaches? Even when you mark something very clearly as not supported, beyond just a symbol not being exported, if a customer uses it anyway, and has a problem, and complains, you're going to feel the burden of decision to endure pain helping them anyway or maybe lose their business.

But during development at least, possibly it's more desirable even if you intend to restrict it later, because it lets you easily change the vars and not have to recompile functions, or if you end up wanting to export a var for others to use there's less to do. What if you're binding object instances and want to change slot values? And during release, it's also possibly more desirable, because enabling end-users to "get at your privates" may be more useful for them even if they shouldn't and it invites potential pain. It all depends.

Additionally, the former gives you the power of special/dynamic scoped variables, which are tremendously useful at times (see: all the included dynamic vars CL gives you out of the box, like *standard-output*, that let you write code that can more easily be repurposed to do something different without changing the original code). If you don't need or expect to need such power, then it's less beneficial.

The performance implications of special/dynamic vars are harder to reason about. At the local level, I suspect there's a measurable (if minor) performance penalty when it comes to looking up or modifying the value of a dynamic var vs. a lexically bound let var (closure lookup), and so the second style would be slightly faster. Though in the broader perspective, if you have to continuously pass that lexical var further down the call stack, the total cost for using it that way at the bottom may be greater than the cost of not having to pass it and just having a dynamic var lookup down there.

There may be no performance difference between the two if you can convince the implementation's compiler both are immutable and inlineable (and that the former is non-special thus it probably wouldn't be using defvar/defparameter). But even if you can't convince, if the values actually are in practice immutable, their 'sin' for being package-global and danger of being misused is further reduced. No one cares that cl:pi is a global, in fact I'm sure most people using it are happy with that choice since in most cases they can just type pi instead of something like Math.PI.

Related to the potential perf issues of passing everything through deep call chains, there's the non-perf argument from the purely functional style side of things that really neither of the two styles are acceptable, because they pollute functions with hidden dependencies. Albeit only the immediate scoped ones for the let style. Thus we should seek a more functional and immutable approach and pass things explicitly.

For an unclear tradeoff, in terms of confusion to a newbie, what's greater, the confusion of dynamic variables, or the confusion that cl:defun produces a package-global function regardless of where that defun is lexically executed? And do we really want that ugly extra level of indentation for everything? (Or arguments about whether it's ok to drop it or not, as I've seen in C++ when they use namespace X { }. And is it easy to adjust our auto-indent tooling to drop it?)

Allow me a digression. I once interviewed for a Clojure job and I decided (in retrospect probably mistakenly) to code my solution up in such a way to show that I knew what Clojure had to offer, and not necessarily the way I felt Clojure should be written, nor with full consistency. It was a simple problem requiring a small amount of code, but I used a dynamic var, a watcher, and a pre-condition. The interview feedback didn't like my use of a pre-condition or the dynamic var. For both, "slowness" was a cited reason, (My first thought: come on guys, we're using Clojure...) But for dynamic vars in addition they disliked the whole concept. Instead they preferred the purely functional style of passing in the var to every function that needs it.

As I said dynamic vars can be very useful, one fun use case is for test code where you can freely and easily create a test double that will be in effect no matter how deep a test call chain goes. In that interview problem, I added a test doing that with my dynamic var data store. Now sure in this trivial case there's not much moral difference between what I did vs creating the same test double and passing it explicitly to each function (who then in turn will be responsible for passing it further down, if needed). If a code review comment wanted one or the other, I wouldn't fight either way! I don't have a strong preference in this case for either way, but I think dynamic vars are cool, and give you more flexibility in the future -- if you need more state, for instance, you don't have to go update function arguments.

Such usage does fly in the face of best Java practices, though, and the functional programming style referenced above, where implicit or hidden dependencies (which dynamic vars definitely are) are very often bad form. But at least for Java I think this practice recommendation has as much to do with how difficult such dependencies are to mock (PowerMock usage is a smell) for testing in Java, as with what's actually a better software practice. I agree with the software practice principle that functions with many arguments (dependencies) aren't a good idea, and just because your language lets you implicitly bring in variables or Singletons or whatever doesn't mean you can ignore that doing so is morally the same as adding an extra parameter to the function signature. And yet, dependencies come in different flavors. No one complains if I implicitly use pi, or a built-in language function in my function body, even though that too is technically a dependency. Taken to the extreme, I should not (defun square (x) (* x x)) but instead I should (defun square (mult-op x) (funcall mult-op x x)) and call it with whatever multiplication op I have in mind explicitly -- (square #'* 4) or perhaps I have a "faster" multiplication that only works on 32-bit numbers and won't auto-promote. Or, whatever, flexibility! Full and split up coverage! (See also Enterprise FizzBuzz.)

I think purely functional ideas can be taken too far. So if the language's multiplication function isn't a problem, or even a language's built-in constants like Math.PI or what have you, then certain special variables can also not be a problem. Like if they are in fact immutable, or at least have a thread-safe mutation API (e.g. Clojure atoms), or are wide-spread in their needed usage (like a database handle), or you actually need the special feature of dynamic scope. And again, because I agree with the principle of too many args/dependencies, and not necessarily with the functional principle of everything passed explicitly, just because you hoist up 5 variously used dynamic vars into the root entry function and add five extra params to it so it can optionally pass them down where needed, doesn't make that any better and can in fact make it worse.

Ok, let's move on to discussing one of those valid sounding uses of globals, like a database handler. First, remember that symbols aren't actually global. But what is something that's truly global in Lisp? Package names! If you defpackage foo, there can only be one package FOO (until first-class environments are a thing, anyway) for your program. You can overwrite it, add stuff to it, and so forth, but you can't have two distinct packages both called FOO in the way you can have two distinct defvars *FOO* (which live in different packages). (You can have package local nicknames now though, which helps, because otherwise nicknames are also global!)

There's no escaping something that is truly global in the end. Yet, global isn't necessarily a bad thing. You need globals, to some extent, even if only a single global main function serving as unique entry point for your program to begin its execution.

The keyword there is unique, and naturally the "Singleton pattern" follows as a slightly more principled way to introduce globals, in that the pattern enforces some sort of uniqueness. Many programs legitimately have need of one and only one "thing" that is solely responsible for its stuff. And many different parts of the program need to get access to that thing in order to have it do stuff for them. In essence, I'm describing a services architecture. Sometimes services will be called managers. Most "services" have need of one and only one uniquely identified entry point that other services can use to talk to it. Because when "talk to" comes up, object-oriented protocols are a natural (if not the only) match, this service is often just implemented as an object that can receive messages. (Behind the scenes there can always be some sort of load balancing. e.g. A DatabaseService might just be responsible for handing out DatabaseConnections from a pool.)

So the question is: how do you create this object, how do you get references to it, and how do you know what messages it can receive?

In Java, there are no global variables allowed, only (like Lisp) packages are global, and the only top-level entities are classes. You could just create a new instance of the service class once somewhere near the top of your call stack and pass it around everywhere that needs it as a singleton-by-convention, but this is going to lead to some interesting tangles and ugly function signatures as discussed previously with functional programming tradeoffs, and nothing prevents other code from newing its own version. In that event if you actually need a singleton, if your code really depends on there being only one instance, you're going to have a problem. So basically no one does this.

Java classes can have "static methods" which are almost as good as top-level functions. So one way to solve this problem of getting access to the single Service object is to have a static class method and that leads to calls like SomeService.getInstance(). If the object has been constructed already, it's returned, otherwise it's constructed, cached, and then returned. You can easily make this thread-safe too, though in practice it's a common oversight (and also in practice rarely a biter).

Lisp has top-level functions, so an effective equivalent (and a hint of where I'm going) would be (some-service:get-instance).

In Java, that way (and a similar way of having a single general Provider.get(SomeServiceInterface.class) that may break up compile time dependencies better) is somewhat frowned upon these days for two reasons. First, it is too tempting for any code that wants to talk to that service to just stick that static call willy-nilly in their message bodies, creating a hidden dependency. The "proper" style if you do this is for each class that needs that service to acquire it explicitly by accepting a SomeService in its constructor and storing a copy in the instance fields, or accept it explicitly as a method param, or expecting a call to a setSomeService() as part of its broader initialization protocol, and maybe having a fallback in the constructor to using that static call on its own only if the caller didn't give it the service reference. But again, this proper way is not enforced, and so lazy or ignorant developers will often not do it.

The second reason, and made more difficult by the presence of the first, is that introducing a test double for this service is rather difficult if any consumers use the static method to request it. The best quick fix is to modify the code into the "proper" style, and then you can just create your own mock Service and pass it explicitly in the test, but if you can't modify the code or you're lazy you're going to need to reach for PowerMock. And once you start using PowerMock, without care you're going to use it everywhere, and mock all the things instead of recognizing too many mocks in a test is a code smell and a signal that refactoring is needed. Just like too many function params.

These days, the "preferred" style is to use a dependency injection framework, like Spring, because it more or less enforces the "proper" style, one way or another, and can additionally provide other nice benefits when you have need of non-singleton or weird-singleton services, and a whole host of other features (like thread safety or proxying). If the proper style is used, the risks of those two issues don't apply, so forcing the proper style is seen as good by many Java programmers to avoid the risks.

There's a complicated topic of performance here (and for the equivalents in Lisp) but I'm not going to get into that much here. My take is an extra function call (assuming no inlining, which is a big assumption for the JVM) to get a reference is in most real-world cases not a big deal. If it is, like in a critical tight hot loop, making the function call once to bind to a local var just above and using that is not a challenging refactor.

Also I should mention it's not a big deal, at least for ten years now, to convert a static method call Foo.call() into a new-and-throw-away new Foo().call(). You might do this so that you can better test Foo itself. Anyway, since like the early or mid 2000s JVMs were capable of looking at this as just incrementing a pointer in the GC's young generation block of memory, that's a lot faster than malloc. Later JVMs (like around Java 6) could, after inlining, do escape analysis and see that the object is created and then not used beyond the method call and thus allocate and destroy bits on the stack. Not quite exactly like C++ (google scalar replacement) but it's just more support that such a refactor doesn't impact perf much if at all.

Now back to Lisp. We could use that some-service:get-instance function, and inside it's just a call to make-instance if needed. Or we might not actually return an object and maybe we do things the Clojure way with a map. In any case, besides the actual implementation of the thing being returned, we have many options for where to cache the thing so that it's a singleton -- a non-visible let binding as Claus' blog shows, or perhaps something like a memoization hash table, or an actual defvar symbol (of a hash table or just for the object itself), or a class-allocated slot (static member/field/attribute in Java parlance). We can add some code to ensure that only one instance is allowed to exist, else an error is raised, and prevent others from their own make-instance'ing.

The risks that Java warns about are lessened. That's because nothing stops you in your test code from just redefining some-service:get-instance to be whatever you want, returning your own test double. As long as all users are going through that function, even if they're doing so in a hidden-dependencies way, you'll still have them covered. Just make sure you re-define the original function when you're done.

Similarly, if the backing cache for that instance is just a defvar, you don't even need to redefine get-instance (and in fact that may not catch everything if some functions are using the defvar reference directly). And you don't need to worry about undoing your change after your test. You just dynamically rebind the defvar, test, and you're done.

Now to bring up a different topic. In Java, classes own their methods, and so when you eventually do get a Service object, you can type Service.<ctrl+space> or whatever and see a list of methods you can call, i.e. the protocol. Or you can jump to the interface/class definition and know by reading that what's there and what's in any super-classes/interfaces is all you can do. You can do a cross-reference on the type to find other parts of code using that type.

But in Lisp, classes and methods are separate. The methods that could be called with your object aren't even necessarily defined in one package, let alone one file, so finding applicable ones may be difficult. Sure, if you know the name or part of the name of a method, you can type it and ask your editor to auto-complete. Or if you at least know the package, you can type the package and press tab or whatever to have your editor show you all the exported symbols. It's also a good idea in any language to read the class definition at least and see if it documents any protocols (and then find such protocols if they are technically separate things as in Lisp) but when you've experienced intuitive objects where you don't need to read all the details and it's just obvious what and how to do something from the exposed API you can quickly see with an editor shortcut, you want that for everything. So for Lisp, are you stuck? Actually no! You can cross-reference too (via slime, which will invoke the implementation-dependent calls) and ask to see "Who Specializes" on the object's class. This will also find the methods that dispatch on it as a non-first argument! Try asking your Lisp who specializes on string sometime.

Unfortunately such cross referencing won't help with simple functions. Well you might argue that if you're using an object, you want to be using generic functions anyway. Maybe. Anyway if you declare types, or they've been inferred, it's possible at least in SBCL to scan all functions and their types and return those that take your type as an argument. (I thought also you would see locations of make-instances for your class, but either I'm using the wrong introspection calls or I'm mistaken. Well, there's always text search...)

But now I'm finally at the point I wanted to get to earlier. For singleton service type objects at least, the modest complexities of talking to it don't often warrant the power of CLOS multiple-dispatch. It's very likely all the applicable methods will be defined next to the class, and there's also not often going to be much inheritance going on.

So we can create a design principle here. Every service gets its own package, and the exported symbols for that package is the service's API and nothing else.

It doesn't matter what your backing package-global storage system is, the point is to make your usefully package-global singleton actually global by effectively making it the package itself. It doesn't even matter if you use simple functions instead of generic ones. In fact, you don't even need to expose the backing object (directly or via get-instance or otherwise) at all.

Let's take a simplified example and say you had a master-stack service. It's the one stack to rule them all, there can be only one of it in your program. What does the interface look like? Two functions only: (master-stack:push thing) and (master-stack:pop). When you quickload it, and type master-stack: and ask your editor to show the exported symbols, it's immediately obvious how to use this thing. You don't need to look at the implementation (unless perhaps if you're writing unit tests for it and the naive just-test-the-API-with-real-data approach fails -- maybe the implementation is spinning up a heavy database that you need to test double out).

Behind the scenes, maybe you have a master-stack class with a slot for the storage. You might have a defvar *master-stack* that makes the instance when the file is loaded. Then your two functions may just reference that *master-stack* directly and use slot-value or some accessor you put in. Alternatively, you might not have a *master-stack* var, and instead those two functions first call a hidden get-master-stack function that memoizes a make-instance. Alternatively, you might have the *master-stack* var again, and push and pop just immediately delegate to calling (master-stack-push *master-stack* thing) and (master-stack-pop *master-stack*) which are the expected defmethods that do the work. And alternatively you might not have any class at all and just have those two functions wrapped by a let for the storage, in poor-man's OOP style.

This principle can work to varying degrees for other things too (like data structures), not just for singleton services. Think of packages as beyond just organization (as championed by the one-package-per-file approaches) or an annoyance (there are so many symbols in the CL package, I too will have so many symbols!) and instead representing the intended API and protocol of something.

Too many Lispers wait too long before splitting things up into multiple packages. If you keep things in one package then yes, package-global ends up being effectively global for your program, and the usual risks and poor experiences that entails.

Winding down... where I'm still conflicted in my own code is how much to expose, how much to make convenient, how much to allow for possibilities. I've been working on some simple SDL2 games, and a library to simplify making them. In SDL2 if you're keeping things simple then you have a single *window* for the game window and a single *renderer* for its rendering commands. *renderer* or at least a function returning it makes sense to be a global because so many other functions depend on it as an argument, and so many parts of the game will want to use such functions. Bad things can happen though if a thread that didn't create it tries to use it. Well for now everything runs in a single game thread.

Since I have this global, I'm conflicted on whether I just export it and call it a day, forcing everyone to use the raw calls that require it, or should I instead wrap more of such calls so that they use *renderer* implicitly and export those, rather (or in addition to?) the *renderer* itself. I already wrap some things, e.g. I have my own (quit) function that handles shutting everything down and freeing up the foreign references including the window and renderer, the library user doesn't need to pass them in... I also have my own services, like a *font-manager* and a *texture-loader*, which are actually classes, in their own packages with their own obvious APIs, but I'm facing similar decisions of whether to actually implement things with defmethod (and pay the minor amortized price such flexibility gives) and leave the possibility of subclassing/mixins/auxiliary methods to future library users, whether to require an explicit *texture-loader* object or just do things implicitly, whether in this case it's actually a singleton because you might want multiple ones for different asset folders (do I need a factory?) and so forth.

Well, that's for me to continue figuring out. In conclusion I think package-globals can be very fine things, but like all state, you need to watch out, more so the broader the scope. If you can limit the scope to a small package, or if you can make the thing immutable, there are fewer problems.


Posted on 2022-02-04 by Jach

Tags: java, lisp, programming

Permalink: https://www.thejach.com/view/id/394

Trackback URL: https://www.thejach.com/view/2022/2/globals_in_lisp

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.