You might not need ECS
So around May of 2024, I finally got around to reading this article on lisp gamedev with ECS. It's very good and explains things really well. The author created a nice ECS system and if it fits your development style, by all means, use it and let's have more Lisp games!However, I've been fairly skeptical of the ECS idea since I first came across it however long ago. Now, to be fair I'm just a nobody with no commercially shipped games and only a few relations with people who are in or have been in the industry, but I'm not the only one who is skeptical. Check out this short article from the creator of Godot explaining why Godot isn't ECS-based.
To grossly simplify that post, and arguments made by others elsewhere, the reasons for being skeptical about ECS mostly boil down to two things:
1. Game code organized in an ECS fashion is harder to reason about and modify than more traditional methods.
2. The performance benefits might not be large in consideration with the type of game and the rest of the game, and anyway you can reap the bulk of those optimizations later on by refactoring only part of your game from more traditional methods.
To put the pro-ECS viewpoint in converse, I think you could say:
1. ECS code isn't that much harder to reason about and change, or indeed may even be easier!
2. You start with good performance. It's harder than you think to go from bad performance to good than just maintaining good performance from the start. Unity had to rearchitect everything when they switched to an ECS foundation.
My own view is that I strongly believe in the anti-ECS version of point 1, and I'm somewhere in the middle of the pro/anti ECS view for point 2. Sometimes optimizations are easy, sometimes they're not. I will say: if you take the oft-quoted (and often underquoted without its full context) "premature optimization..." line too seriously, you'll never actually bother trying to optimize, and you'll develop no skill in it, and your efforts will be doomed. You need to sometimes do things just to build skill in the thing, even if doing them right now is not the most justifiable use of right now's time. The future matters too!
Anyway, I wanted to explore both points a bit myself with the original Lisp post. Comparative examples are always helpful. So I took their asteroids simulation example and ported it to my own little library's backend (I call it lgame, it uses SDL2 rather than Allegro). This version is pretty much the same as shown in the original article, just using slightly different code for rendering.
The full code is at my ecs-compare repo. Let me briefly describe the organization.
ecs-version.lisp is the core code found in the article linked at the top. It's responsible for defining the components, the systems that do the important computations, an initializer to hook this all up, and lastly an update function and render function that get called every frame to update the game objects and draw them.
al-wrapper.lisp is my wrapper code around the allegro functions the original ECS version used, so that instead they call into my lgame code or sdl2 directly. I made them all inlineable functions to try and avoid unnecessary penalties.
object-version.lisp is my version of the core code. It's responsible for abstractly the same stuff: define the game objects, their important computations, an initializer to hook stuff up, and update and render entrypoints to call each frame. (Honestly, it'd be more idiomatic to split this out into separate files, one for each game object, but the example is small enough it's fine to all be in one file.) The organization is quite different though and follows more traditional OOP methods: we have an abstract "game object" class which just compresses some typing; it says that all game objects are lgame.sprites that have a couple mixins for automatic group behavior and memory resource cleanup behavior. Then we define the game objects as their own classes with their own logic and (if needed) draw functions.
main.lisp initializes my lgame library, sets up two windows so you can run the two versions side-by-side (and pause/resume them independently, just type 'o' to toggle pause the object version and 'e' for the ECS version), and starts the game loop that handles input and calls each version's update and render logic each frame. (The two windows were also useful to help me ensure I didn't accidentally make the object version behave differently.)
Exploring Readability
ECS Version
Let's dig into some details. The version of cl-fast-ecs I'm using is the 2023-10-21 version from Quicklisp. It would not surprise me if there have been significant performance improvements since then, including possibly multi-threading. The example asteroids project is originally from here with latest commit on Feb 26, 2024. (32acb40.) Looks like there have been a few commits since, but nothing that sounds too impactful.
The ecs-version code starts by defining several components:
(ecs:define-component position
"Determines the location of the object, in pixels."
(x 0.0 :type single-float :documentation "X coordinate")
(y 0.0 :type single-float :documentation "Y coordinate"))
(ecs:define-component speed
"Determines the speed of the object, in pixels/second."
(x 0.0 :type single-float :documentation "X coordinate")
(y 0.0 :type single-float :documentation "Y coordinate"))
(ecs:define-component acceleration
"Determines the acceleration of the object, in pixels/second^2."
(x 0.0 :type single-float :documentation "X coordinate")
(y 0.0 :type single-float :documentation "Y coordinate"))
(ecs:define-component image
"Stores ALLEGRO_BITMAP structure pointer, size and scaling information."
(bitmap nil :type (or null sdl2-ffi:sdl-texture))
(width 0.0 :type single-float)
(height 0.0 :type single-float)
(scale 1.0 :type single-float))
(ecs:define-component planet
"Tag component to indicate that entity is a planet.")
Position, speed, and acceleration are all 2D vectors. The image component is a structure containing our sprite data (in SDL, this is a pointer to a texture, which resides in VRAM) and the sprite's width, height, and scale.
And the planet component is just for tagging purposes, the article explains more.
Next we go immediately into defining systems. The order of them here is most fascinating to me. The first system given is the one to draw images:
(ecs:define-system draw-images
(:components-ro (position image)
:initially (al:hold-bitmap-drawing t)
:finally (al:hold-bitmap-drawing nil))
(let ((scaled-width (* image-scale image-width))
(scaled-height (* image-scale image-height)))
(al:draw-scaled-bitmap image-bitmap 0 0
image-width image-height
(- position-x (* 0.5 scaled-width))
(- position-y (* 0.5 scaled-height))
scaled-width scaled-height 0)))
Simple drawing code. Were it not for the non 1.0 scale, it could have been even simpler. But still, I'm left puzzled: when does this code actually get invoked? I can only assume that it happens whenever its required input components, position and image, are available for read-only access. Let's read on:
(ecs:define-system move
(:components-ro (speed)
:components-rw (position)
:arguments ((:dt single-float)))
(incf position-x (* dt speed-x))
(incf position-y (* dt speed-y)))
Here is the "move" system. Going with the last assumption, for this code to run the "speed" component needs to be ready for reading. The "position" component is also specified here as a "read-write" dependency. To me this implies that this code must run before any other system that depends on the "position" component in a read-only form. I'm curious what would happen though if another system also said it had position as a read-write component! Anyway we see that the code updates the position vector, using the input speed vector and the frame dt. Simple enough movement code, the position update is just velocity times time. Next:
(ecs:define-system accelerate
(:components-ro (acceleration)
:components-rw (speed)
:arguments ((:dt single-float)))
(incf speed-x (* dt acceleration-x))
(incf speed-y (* dt acceleration-y)))
Here we need the acceleration component ready for read, and we make writes to the speed component in the computation. Therefore this system must run before the move system. Again the computation is simple, the update to velocity is just acceleration times time. So far, our systems run in the reverse order that they've appeared in the code.
(Side note: this is more correct, as noted in Gaffer On Game's article. If you compute the change to position, first, then the change to velocity, you're using the "euler integrator". However this creates instabilities. Instead, by computing the change in velocity first, then the change in position (using the new velocity), we use "semi-implicit euler" and get good results. The article notes that "Most commerical game physics engines use this integrator.")
(ecs:define-system pull
(:components-ro (position)
:components-rw (acceleration))
(let* ((distance-x (- *planet-x* position-x))
(distance-y (- *planet-y* position-y))
(angle (atan distance-y distance-x))
(distance-squared (+ (expt distance-x 2) (expt distance-y 2)))
(acceleration (/ *planet-mass* distance-squared)))
(setf acceleration-x (* acceleration (cos angle))
acceleration-y (* acceleration (sin angle)))))
Here is our next system, the "pull" one. The math is simple, using the simple gravitational formula to explicitly set the (constant in time) acceleration each frame. We see it depends on position being available for read-only, and writes to acceleration.
But hmmm... don't we have a conflict here? We need the pull system to run so that acceleration is available to run the accelerate system so that speed is available to run the move system so that position is available to... draw... and also to pull?
Perhaps my understanding of what the components might imply on execution order is wrong? Well let's continue on.
(ecs:define-system crash-asteroids
(:components-ro (position)
:components-no (planet)
:with ((planet-half-width planet-half-height)
:of-type (single-float single-float)
:= (values (/ *planet-width* 2.0)
(/ *planet-height* 2.0))))
(when (<= (+ (expt (/ (- position-x *planet-x*) planet-half-width) 2)
(expt (/ (- position-y *planet-y*) planet-half-height) 2))
1.0)
(ecs:delete-entity entity)))
This is our last system... it also requires position to be ready, and filters out the planet from the computation. I'm still confused on the order of execution. I don't think we have anything here to say whether to crash the asteroids or to draw the images first? So isn't it possible to draw the image and then crash the asteroids, when it should always be the other way around so there's not a weird one frame delay of crash and removal?
Well let's continue on and maybe some code later will enlighten us. Let's check the initializer:
(defun init (window-width window-height asteroids)
(ecs:bind-storage)
(let ((background-bitmap-1 (al:ensure-loaded
#'al:load-bitmap
"parallax-space-stars.png"))
(background-bitmap-2 (al:ensure-loaded
#'al:load-bitmap
"parallax-space-far-planets.png")))
(ecs:make-object
`((:position :x 400.0 :y 200.0)
(:image :bitmap ,background-bitmap-1
:width ,(float (al:get-bitmap-width background-bitmap-1))
:height ,(float (al:get-bitmap-height background-bitmap-1)))))
(ecs:make-object
`((:position :x 100.0 :y 100.0)
(:image :bitmap ,background-bitmap-2
:width ,(float (al:get-bitmap-width background-bitmap-2))
:height ,(float (al:get-bitmap-height background-bitmap-2))))))
(let ((planet-bitmap (al:ensure-loaded
#'al:load-bitmap
"parallax-space-big-planet.png")))
(setf *planet-width* (float (al:get-bitmap-width planet-bitmap))
*planet-height* (float (al:get-bitmap-height planet-bitmap))
*planet-x* (/ window-width 2.0)
*planet-y* (/ window-height 2.0))
(ecs:make-object `((:planet)
(:position :x ,*planet-x* :y ,*planet-y*)
(:image :bitmap ,planet-bitmap
:width ,*planet-width*
:height ,*planet-height*))))
(let ((asteroid-bitmaps
(map 'list
#'(lambda (filename)
(al:ensure-loaded #'al:load-bitmap filename))
asteroid-images)))
(dotimes (_ asteroids)
(let ((r (random 20.0))
(angle (float (random (* 2 pi)) 0.0)))
(ecs:make-object `((:position :x ,(+ 200.0 (* r (cos angle)))
:y ,(+ *planet-y* (* r (sin angle))))
(:speed :x ,(+ -5.0 (random 15.0))
:y ,(+ 30.0 (random 30.0)))
(:acceleration)
(:image
:bitmap ,(alexandria:random-elt asteroid-bitmaps)
:scale ,(+ 0.1 (random 0.9))
:width 64.0 :height 64.0))))))
(setf *font* (al:load-ttf-font +font-path+ +font-size+ 0))
)
Let's take it bit by bit. First we "bind the storage", whatever that does. Next we make our first two entities with ecs:make-object. They both are made of two components: a position and an image. Next, the planet entity is made, which is made of three components: position, image, and planet. Lastly, a bunch of asteroids are made. Each asteroid is made of position and image, as well as speed and acceleration.
For every entity, they all set initial values for their component data, except the asteroids which leave their acceleration vector unspecified. This may be a hint on order of operations for the system. Every component's values are technically initially ready for read access, except the acceleration ones, so let's run pull first. When pull is done, acceleration will be set, so the accelerate system can run. When that's done, speed has been reset, but...
Every frame, we call a a single function that handles updating everything and drawing it: (ecs:run-systems :dt (float dt 0.0))). It's not help in figuring out the order, though.
Well, let's add some print statements and figure it out manually I guess. I reduced the asteroids count to 10 and paused the ecs version after two frames so I could calmly scroll through the output in the REPL. Here is the output:
Called update
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Called update
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running crash-asteroids
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running pull
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running accelerate
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running move
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Running draw-images
Interesting! So it's running in the reverse order that we've defined them in the source code. Is the order the same if we e.g. define crash-asteroids first, and swap move and accelerate?
No! Now pull runs first, then move, then accelerate, then draw-images, and finally crash-asteroids. Again this is reverse-order to how it's defined in the source code, but this is completely wrong.
One last try to make sure. I modified them again so that the source order from top to bottom in the file is: crash-asteroids, accelerate, move, pull, draw-images. The resulting output comes in the reverse order: draw-images, pull, move, accelerate, crash-asteroids. Again, completely and utterly wrong.
Am I crazy for thinking this is a complete deal breaker when it comes to this pattern? The original point 1, that ECS code is harder to reason about and modify, seems very evident here. Especially in the context of Lisp, where I might want to change my systems dynamically, adding and removing them, not just what components an entity has. For a larger project, I also think it seems plausible to define systems in separate files, now good luck keeping track of order of execution then.
Try it yourself: just swap the order of the accelerate and move systems, and see the instability that comes from using the wrong order.
As a thought exercise, imagine adding a new feature to this simulation: user input. Let the mouse cursor be a point in the simulation, and when the user presses left click, an attractive force appears at the point. When the user presses right click, a repelling force appears at that point. This lets the user move their mouse around to mess with the orbits of the asteroids. How straightforward do you think it'd be to add this feature to the ECS version?
Object Version
Let's now compare my object version. Normally, these would be put in three files for each class, but:
(defclass game-object (sprite add-groups-mixin cleaned-on-kill-mixin)
())
(defclass background (game-object)
())
(defmethod initialize-instance :after ((self background) &key image pos-x pos-y)
(setf (.image self) image
(.rect self) (get-texture-rect image))
(move-rect (.rect self) pos-x pos-y))
(defmethod draw ((self background))
"Draw the bg objects 'scaled' by 1 and shifted over"
(let ((x (sdl2:rect-x (.rect self)))
(y (sdl2:rect-y (.rect self)))
(w (sdl2:rect-width (.rect self)))
(h (sdl2:rect-height (.rect self))))
(al:draw-scaled-bitmap (.image self)
0 0 w h ; src rect
(- x (* 0.5 w))
(- y (* 0.5 h))
w h ; dst rect
0)))
(defclass planet (game-object)
((mass :accessor .mass :initform 500000.0)))
(defmethod initialize-instance :after ((self planet) &key)
(setf (.image self) (get-texture "parallax-space-big-planet.png")
(.rect self) (get-texture-rect (.image self)))
(setf (rect-coord (.rect self) :center) (rect-coord lgame:*screen-rect* :center)))
Our game-object base class is just to save a tiny bit of typing for the other base classes. A sprite is an lgame.sprite. Its main slots are an image (meant to hold a pointer to the texture to "blit") and an SDL rect (used to specify where on the screen to draw the image; for simple games can be used as a position vector, but its x,y,w,h fields are integers, so it doesn't work well for all games -- the x and y should only be thought of as pixel coordinates). Two important generic functions are defined for sprites: an update method, whose default implementation does nothing, and a draw method, which by default blits the image slot to the place on the screen specified by the rect slot. This means that many game sprites just need to provide their own update function, which ends by updating the rect as needed, and drawing happens correctly. It's inspired by PyGame.
Our first sprite is the background sprite, which has a constructor to set the image and initial position. This is fairly idiomatic. What's not idiomatic is the overridden draw method, done to match the ECS version's interesting choice of scaling the background instead of just modifying the original asset or making a separaton between logical pixel screen size (e.g. 256x244 for a "SNES feel") and physical pixel screen size (e.g. 1920x1080, a common monitor size still). That separation is trivially done by SDL2 and lgame, perhaps not so in Allegro.
The next sprite is the planet sprite. We put its mass property in its own slot rather than a package-global variable. The constructor is more or less the same, we just don't take any extra arguments -- there's only ever one planet object, so we can just hardcode the texture path right there, and we update its rect so that its center matches the screen's center. get-texture is an lgame function that handles loading the texture into VRAM with SDL2 but also caches it so subsequent gets don't needlessly re-load it. rect-coord is a nifty function that accepts various shortcuts to properties of the rectangle, like its :center position, for easy queries and updates.
Because the planet is static, there is no update method, and the default draw method for sprites will be sufficient, so there's no override there either.
Next we define our asteroid class:
(defclass asteroid (game-object)
((textures :accessor /textures :allocation :class
:initform '("a10000.png"
"a10002.png"
"a10004.png"
"a10006.png"
"a10008.png"
"a10010.png"
"a10012.png"
"a10014.png"
"b10000.png"
"b10002.png"
"b10004.png"
"b10006.png"
"b10008.png"
"b10010.png"
"b10012.png"
"b10014.png"))
(pos :accessor .position :initform (vector 0.0 0.0))
(vel :accessor .velocity :initform (vector 0.0 0.0))
(acc :accessor .acceleration :initform (vector 0.0 0.0))
(planet :accessor .planet :initarg :planet)))
Here I decided to specify the texture paths as a class-allocated slot, again to avoid populating the package-level namespace too much with defvars and such. It doesn't really matter, it could just as well be a local var in the init function and passed via a keyword argument to the constructor. Given later refactors, that'd probably be wiser anyway.
The asteroid is also given three separate position, velocity, and acceleration vectors, implemented as just simple arrays of two numbers. Lastly, a handle to the planet is given, again rather than using a global *planet* var.
Its constructor should look familiar:
(defmethod initialize-instance :after ((self asteroid) &key)
(let ((r (random 20.0))
(angle (random (* 2 pi)))
(planet-y (rect-coord (.rect (.planet self)) :centery))
(image (get-texture (alexandria:random-elt (/textures self)))))
(setf (aref (.position self) 0) (+ 200 (* r (cos angle)))
(aref (.position self) 1) (+ planet-y (* r (sin angle)))
(aref (.velocity self) 0) (+ -5 (random 15.0))
(aref (.velocity self) 1) (+ 30 (random 30.0))
(.image self) image)
(let* ((rect (get-texture-rect image))
(scale (+ 0.1 (random 0.9)))
(scaled-w (* scale (sdl2:rect-width rect)))
(scaled-h (* scale (sdl2:rect-height rect))))
(set-rect rect
:x (- (aref (.position self) 0) (* 0.5 scaled-w))
:y (- (aref (.position self) 1) (* 0.5 scaled-h))
:w scaled-w :h scaled-h)
(setf (.rect self) rect))))
This is basically copied from the ecs-version's initialization. Some differences are that we explicitly use the planet's :centery location, and because our vectors are just plain 2-element arrays we use the clunky aref 0/1 for them. A 2D vector struct to set :x and :y would be cleaner.
One interesting bit is the rect code. While the ecs-version just defines a scale and calculates the appropriate width and height changes each draw, here we define a scale and set the rect's new width and height immediately. Note that there is no draw method defined for asteroids, because the default draw method uses this rect and so will scale things appropriately as a consequence.
There is an update method, which is where the bulk of the code is:
(defmethod update ((self asteroid))
(let* ((dt (dt))
(pos-x (aref (.position self) 0))
(pos-y (aref (.position self) 1))
(planet (.planet self))
(planet-x (rect-coord (.rect planet) :centerx))
(planet-y (rect-coord (.rect planet) :centery))
(distance-x (- planet-x pos-x))
(distance-y (- planet-y pos-y))
(angle (atan distance-y distance-x))
(distance-squared (+ (expt distance-x 2) (expt distance-y 2)))
(accel (/ (.mass planet) distance-squared))
(accel-x (* accel (cos angle)))
(accel-y (* accel (sin angle))))
(setf (aref (.acceleration self) 0) accel-x
(aref (.acceleration self) 1) accel-y)
(incf (aref (.velocity self) 0) (* dt accel-x))
(incf (aref (.velocity self) 1) (* dt accel-y))
(incf (aref (.position self) 0) (* dt (aref (.velocity self) 0)))
(incf (aref (.position self) 1) (* dt (aref (.velocity self) 1)))
(let ((new-x (aref (.position self) 0))
(new-y (aref (.position self) 1))
(planet-half-w (/ (sdl2:rect-width (.rect planet)) 2.0))
(planet-half-h (/ (sdl2:rect-height (.rect planet)) 2.0)))
(set-rect (.rect self) :x new-x :y new-y)
(when (<= (+ (expt (/ (- new-x planet-x) planet-half-w) 2)
(expt (/ (- new-y planet-y) planet-half-h) 2))
1.0)
(kill self)))))
Here is the key difference with the ECS version. All these computations happen in separate systems under the ECS model, and in an uncertain order that is also hard to change later. Here though, the computations happen one after another, there's no ambiguity about the order, and you can change it as you like at runtime by just redefining the method in the REPL.
Sure, it could be done in a more clear fashion with helper functions, I think, instead of doing so much in the let* bindings (in particular the gravity calculation for acceleration), but once it's done, again the order is clear: set the acceleration vector, update the velocity vector, and then update the position vector, semi-implicit euler is happy. Lastly we update our rect coordinates for the on-screen pixels occupied, and do a collision check with the planet. (Normally, you would just use (lgame.rect:collide-rect? rect1 rect2) but I wanted the code to match the ecs version. Perhaps someday I'll add a small shape-collisions library to lgame so that you would just call something like collide-elipse? which would use similar code.) (The kill method for sprites removes it from any groups it belongs to and with the cleanup-on-kill mixin handles the bookkeeping of freeing the sdl2:rect foreign memory. I've been thinking I should just stop using an sdl2:rect in lgame and provide a Lisp structure instead...)
This is also the correct order: logic and physics updates which also can remove things, then draw only after all logic and physics have concluded.
Lastly let's check our initialization:
(defun init (asteroids)
(setf *all-group* (make-instance 'lgame.sprite:ordered-group))
(make-instance 'background
:image (get-texture "parallax-space-stars.png")
:pos-x 400 :pos-y 200
:groups *all-group*)
(make-instance 'background
:image (get-texture "parallax-space-far-planets.png")
:pos-x 100 :pos-y 100
:groups *all-group*)
(let* ((planet (make-instance 'planet :groups *all-group*)))
(dotimes (_ asteroids)
(make-instance 'asteroid :planet planet :groups *all-group*)))
(make-instance 'fps-display :groups *all-group*))
We make an ordered sprite group (draw order is the order things are added to the group) and do give it a top-level binding, mainly for convenience when developing if we want to reach in and look at something in the REPL.
Next we make our background sprites, passing the group in so they add themselves to it. Next the planet sprite. Lastly the n asteroid sprites and an FPS display sprite.
Our tick function that is callled every frame is just (update *all-group*) and (draw *all-group*)), no surprises.
So, I think this is quite a bit easier to understand than the ECS version. I also think the thought experiment of adding in a player-controlled force emitter is a lot more straightforward to do. There are even multiple obvious ways to approach the problem. One way would be to just make a new class, add it to the group, and in its update method check the mouse state to generate a force and then loop through the asteroids to apply the force. (You might want to define an extra group just for the asteroids to make it easier to fetch them.) You could also instead skip making a new class, and just add this as a behavior of the asteroids themselves, and in the already large update method add some code that checks and applies the force there for that particular asteroid.
Exploring Performance
Let's shift to the other topic, performance. Let's first establish some various information on what I'm running so someone could reproduce this. I'm running a rather beefy desktop with a 12-core AMD Ryzen 9 5900X and an Nvidia 4090 graphics card, with 128 GB of ECC DDR4 RAM. I have access to a wimpier system but I didn't run my tests on it for this post; maybe if someone cares enough to ask I'll do it.
So when I run the original ecs-tutorial-1 project with mangohud, I see an average of around 168 FPS with 33% GPU usage and 11% CPU. This is also with 5000 asteroids.
Honestly, this is kinda really terrible. I mean yeah, it's more than 60 FPS, which is all one really needs (sarcastic), but still, for my system? For this simple of a simulation?
Anyway, I'm fairly certain it has something to do with Allegro. When I ported it to lgame, which uses SDL2 as the backend, we'll see I instead get much higher framerates. Ironically mangohud doesn't work (I could have sworn it worked before, but it crashes somehow in the foreign sdl-render-present call which manifests as a Lisp division-by-zero error for some reason) so I added a little graphical FPS display to both the ECS version and the OOP version.
For the ECS version, I now get 680-700 FPS. Over four times higher, just switching to SDL! Now we're talking. (As an aside for what's possible, just rendering the FPS text gives like 17k-20k FPS.)
Now for the OOP version... here I'm getting roughly 270 FPS. While the ECS version has a 2.6 times higher FPS, the OOP version is getting 1.6 times higher than the ECS version under Allegro. And it's much higher than 60 FPS (all you need right?) or even 120 FPS (which is my monitor's refresh rate).
I can even run them together at the same time, and they share an about 180 FPS rate.
So yes, the OOP style is less performant than the ECS style, I think we expected that, though it's not always so obvious. Let's walk through it a bit.
From the data oriented design point of view, this OOP approach isn't necessarily very efficient to start with. Consider the flow of CPU execution for just the 5000 asteroids. Under the OOP style, we iterate over each asteroid object and call our update method on it. The update method does some logic intermixed with physics: it calculates how much acceleration the planet is exerting on it, updates its stored acceleration (actually not a necessary step), uses that to upate its velocity accordingly, and then updates its position accordingly, finally updating its rendering rect's pixel coordinates. Lastly, it checks if it's colliding with the planet, and if so, kills itself.
Each sprite instance has a chunk of contiguous memory dedicated to it, consisting of at least: x-y position, x-y speed, a handle to an image texture, and a rendering rect. Note that the position and speed can, depending on the underlying details, actually be pointers to the two numbers, not the two numbers directly. Anyway, the loop goes one by one, each time having to go grab memory addresses of each sprite (lives anywhere) and each sprite's data (which can likely live somewhere else too). Not great for cache behavior. Especially bad if things live outside of the same 4kb page that the OS has to swap in. And of course the final rect update at the end is updating foreign memory, which has a cost.
Once everything is done updating, we iterate again through the 5k asteroids to call their draw method, which will render them. (Why not do this after the update is done and save some overhead? One, I'm again not trying to do anything clever, and two, in a different game, especially one that has physics broken out more, it's very possible that updates of other sprites or systems can lead to changes in a sprite that has already finished with its own logic. It's good practice to have a game loop do everything that can change the scene and then only render as the final step.)
There are some obvious optimization opportunities here, but first let's go through the ECS version in contrast. ECS says: run all the systems. (Hope you got the order right!)
At least in this version of the ECS project, it does the sytems one at a time, not using threads. We discovered earlier that it runs them in reverse-order of their definition. Let's focus first on just the move system, which comes right before the end where drawing happens.
It needs all entities with the speed and position components. Both of those components happen to be identical from a memory layout perspective, so we can imagine two arrays of 5000 elements each, with each element being the x-y pair of floats (and we'll say each array with its pairs is all contiguous, even, no pointer chasing). The move system applies its logic, reading from the speed array and writing to the position array. It moves to the next element, which is just an index bump, and again reads the speed array and writes to the position array at the new index. Because of how CPUs work, when the first item of the speed array is read, a cache line containing the next several items will also be read into the L1 cache. Same thing when the position array is written to, the next several elements should be part of a cache line to make writes to. So you get very speedy L1 cache access for the majority of the array elements.
This is basically how it works for each system. Yeah it has to loop ~5000 times for each system, but the majority of loop iterations only do stuff with cached memory, so this ends up being faster than a single 5000 iteration that has to go chase pointers. The draw-images system is going to be the slowest, because while the position data is all nice and cache-friendly, the image data is less so: the bitmap is inherently a pointer, and in any case the system actually renders the image which is a slower operation in general.
Let's stress things a bit more. How does the FPS change with 10k asteroids? Remember we started with 5000 asteroids and the OOP version hit 270 FPS while the ECS version hit 700 FPS. Now we're down to about 140 FPS for the OOP version, 350 for the ECS version. In other words, doubling the asteroids roughly cut the performance of both in half. If we double again, to 20,000 asteroids? 66 for OOP, 180 for ECS, so about half again.
One way to look at this: re-architecting from the OOP version to the ECS version would let you have a bit over twice as many asteroids for the same frame rate. How many games need this tradeoff?
Of course, the ECS version could be optimized too (and perhaps already has been in newer versions).
But how about we look at just two obvious optimizations for the OOP side? The first optimization is to drop the acceleration vector from the object's storage. This is really minor, but it does get us up to 70 FPS at 20,000 asteroids.
The more significant change is to not have each asteroid be its own separate sprite. We'll instead make an asteroid-field "sprite" that will handle all of its own asteroids internally. Conceptually this is like designing a particle emitter class instead of having each particle be its own separate sprite. Also conceptually, the initial optimization benefit this gives us now is expected to be rather small still, saving us however many thousands of asteroids of update and draw method calls (which SBCL does a good job to not make much slower than a normal function call, to be fair) as well as foreign memory of the SDL rect attached and updated to each asteroid.
Here is the asteroid-field class. You can toggle it by setting :use-field? to t when the main.lisp file calls object-version:init.
(defclass asteroid-field (sprite add-groups-mixin)
((asteroids :accessor .asteroids)
(planet :accessor .planet :initarg :planet)))
(defstruct simple-asteroid
texture
(pos-x 0.0)
(pos-y 0.0)
(vel-x 0.0)
(vel-y 0.0)
scaled-w
scaled-h)
(defmethod initialize-instance :after ((self asteroid-field) &key planet asteroids &aux textures)
(let ((tmp-asteroid (make-instance 'asteroid :planet planet)))
(setf textures (/textures tmp-asteroid))
(kill tmp-asteroid))
(setf (.asteroids self) (make-array asteroids))
(dotimes (i asteroids)
(let ((r (random 20.0))
(angle (random (* 2 pi)))
(planet-y (rect-coord (.rect (.planet self)) :centery))
(image (get-texture (alexandria:random-elt textures)))
(asteroid (make-simple-asteroid)))
(setf (simple-asteroid-texture asteroid) image
(simple-asteroid-pos-x asteroid) (+ 200 (* r (cos angle)))
(simple-asteroid-pos-y asteroid) (+ planet-y (* r (sin angle)))
(simple-asteroid-vel-x asteroid) (+ -5 (random 15.0))
(simple-asteroid-vel-y asteroid) (+ 30 (random 30.0)))
(let* ((scale (+ 0.1 (random 0.9)))
(scaled-w (* scale (sdl2:texture-width image)))
(scaled-h (* scale (sdl2:texture-height image))))
(setf (simple-asteroid-scaled-w asteroid) scaled-w
(simple-asteroid-scaled-h asteroid) scaled-h)
(decf (simple-asteroid-pos-x asteroid) (* 0.5 scaled-w))
(decf (simple-asteroid-pos-y asteroid) (* 0.5 scaled-h)))
(setf (aref (.asteroids self) i) asteroid))))
(defmethod update ((self asteroid-field))
(let* ((dt (dt))
(planet (.planet self))
(planet-x (rect-coord (.rect planet) :centerx))
(planet-y (rect-coord (.rect planet) :centery))
(planet-half-w (/ (sdl2:rect-width (.rect planet)) 2.0))
(planet-half-h (/ (sdl2:rect-height (.rect planet)) 2.0)))
(loop for asteroid across (.asteroids self)
for i from 0
when asteroid
do
(let* ((distance-x (- planet-x (simple-asteroid-pos-x asteroid)))
(distance-y (- planet-y (simple-asteroid-pos-y asteroid)))
(angle (atan distance-y distance-x))
(distance-squared (+ (expt distance-x 2) (expt distance-y 2)))
(accel (/ (.mass planet) distance-squared))
(accel-x (* accel (cos angle)))
(accel-y (* accel (sin angle))))
(incf (simple-asteroid-vel-x asteroid) (* dt accel-x))
(incf (simple-asteroid-vel-y asteroid) (* dt accel-y))
(incf (simple-asteroid-pos-x asteroid) (* dt (simple-asteroid-vel-x asteroid)))
(incf (simple-asteroid-pos-y asteroid) (* dt (simple-asteroid-vel-y asteroid)))
(when (<= (+ (expt (/ (- (simple-asteroid-pos-x asteroid) planet-x) planet-half-w) 2)
(expt (/ (- (simple-asteroid-pos-y asteroid) planet-y) planet-half-h) 2))
1.0)
;; 'remove' it
(setf (aref (.asteroids self) i) nil))))))
(defmethod draw ((self asteroid-field))
(loop for asteroid across (.asteroids self)
when asteroid
do
(lgame.rect:with-rect (r (simple-asteroid-pos-x asteroid)
(simple-asteroid-pos-y asteroid)
(simple-asteroid-scaled-w asteroid)
(simple-asteroid-scaled-h asteroid))
(sdl2:render-copy lgame:*renderer* (simple-asteroid-texture asteroid) :dest-rect r))))
Here we again store all the data each asteroid needs together, but this time just in a simple struct. The field creates an array of structs for however many asteroids we're using, then creates their necessary data as before.
The update method is mostly the same, too, just applies the update to each asteroid in a loop.
The draw method has to be overriden because our sprite-field "sprite" itself has no sprite. Here we just loop and call the appropriate sdl2 render function with a given rect. Note this rect is stack-allocated.
This takes us from 70 FPS to about 120 FPS with 20,000 asteroids. Not bad, more than I expected. While it could be 'better form' to actually remove the dead asteroids from the array instead of just setting them to nil, what happens when all of them are set to nil is our frame rate improves to over 3000 FPS, so I don't think it's really a problem in practice to skip the iteration, especially considering deleting them would involve shifting all the elements around if we continued to just use the normal array data structure. If it did start to matter, cleaning up at a more opportune time would be smarter.
Oh and back in the 5000 asteroid case, we're at around 450 FPS, up from the original 270 and still slower than ECS's 700. Still, we've closed the gap quite a bit by just being less stupid about what entities get to be full sprites. We didn't need to change from an OOP style.
When I wrote "array of structs" earlier perhaps some hairs were tickled. The ECS design, and data oriented design in general, usually prefers instead to write things as "struct of arrays". What happens when we do the same? But first, let me explore another idea. The code in the update function's loop relies on zero foreign data. What if we executed this loop in parallel? In the ASD file uncomment the lparallel dependency and in the object version init uncomment (setf lparallel:*kernel* (lparallel:make-kernel 12)).
Or replace 12 with however many cores you have. Replace the loop form with
(lparallel:pmapc
(lambda (i)
(alexandria:when-let ((asteroid (aref asteroids i)))
(let* ((distance-x (- planet-x (simple-asteroid-pos-x asteroid))) ...)
...)
)) (loop for i below (length asteroids) collect i))
(In the repo I've just left this commented out, so uncomment it and comment the loop.) The speedup here isn't that impressive as the computation per iteration is still really small. It takes it to about 170 FPS for the 20,000 asteroid case, very close to the 180 for single-threaded ECS but still slower.
Now let's look at the final optimization: replacing the simple-asteroid struct, that we make an array of, with instead a struct of arrays. It's not too different from the asteroid field class, but we'll just include everything at once again:
(defstruct pair
(x 0.0)
(y 0.0))
(defstruct soa-asteroids
textures
positions
velocities
scales)
(defclass asteroid-field-soa (sprite add-groups-mixin)
((asteroids :accessor .asteroids)
(planet :accessor .planet :initarg :planet)))
(defmethod initialize-instance :after ((self asteroid-field-soa) &key planet asteroids &aux textures all-asteroids)
(let ((tmp-asteroid (make-instance 'asteroid :planet planet)))
(setf textures (/textures tmp-asteroid))
(kill tmp-asteroid))
(setf all-asteroids (make-soa-asteroids
:textures (make-array asteroids)
:positions (make-array asteroids)
:velocities (make-array asteroids)
:scales (make-array asteroids)))
(setf (.asteroids self) all-asteroids)
(dotimes (i asteroids)
(let* ((r (random 20.0))
(angle (random (* 2 pi)))
(planet-y (rect-coord (.rect (.planet self)) :centery))
(image (get-texture (alexandria:random-elt textures)))
(pos-x (+ 200 (* r (cos angle))))
(pos-y (+ planet-y (* r (sin angle))))
(vel-x (+ -5 (random 15.0)))
(vel-y (+ 30 (random 30.0)))
(scale (+ 0.1 (random 0.9)))
(scaled-w (* scale (sdl2:texture-width image)))
(scaled-h (* scale (sdl2:texture-height image))))
(decf pos-x (* 0.5 scaled-w))
(decf pos-y (* 0.5 scaled-h))
(setf (aref (soa-asteroids-textures all-asteroids) i) image
(aref (soa-asteroids-positions all-asteroids) i) (make-pair :x pos-x :y pos-y)
(aref (soa-asteroids-velocities all-asteroids) i) (make-pair :x vel-x :y vel-y)
(aref (soa-asteroids-scales all-asteroids) i) (make-pair :x scaled-w :y scaled-h)))))
(defmethod update ((self asteroid-field-soa))
(let* ((dt (dt))
(all-asteroids (.asteroids self))
(planet (.planet self))
(planet-x (rect-coord (.rect planet) :centerx))
(planet-y (rect-coord (.rect planet) :centery))
(planet-half-w (/ (sdl2:rect-width (.rect planet)) 2.0))
(planet-half-h (/ (sdl2:rect-height (.rect planet)) 2.0)))
(loop for pos across (soa-asteroids-positions all-asteroids)
for vel across (soa-asteroids-velocities all-asteroids)
for i from 0
when pos
do
(let* ((distance-x (- planet-x (pair-x pos)))
(distance-y (- planet-y (pair-y pos)))
(angle (atan distance-y distance-x))
(distance-squared (+ (expt distance-x 2) (expt distance-y 2)))
(accel (/ (.mass planet) distance-squared))
(accel-x (* accel (cos angle)))
(accel-y (* accel (sin angle))))
(incf (pair-x vel) (* dt accel-x))
(incf (pair-y vel) (* dt accel-y))
(incf (pair-x pos) (* dt (pair-x vel)))
(incf (pair-y pos) (* dt (pair-y vel)))
(when (<= (+ (expt (/ (- (pair-x pos) planet-x) planet-half-w) 2)
(expt (/ (- (pair-y pos) planet-y) planet-half-h) 2))
1.0)
;; 'remove' it, for laziness just from pos
(setf (aref (soa-asteroids-positions all-asteroids) i) nil))))))
(defmethod draw ((self asteroid-field-soa))
(loop for image across (soa-asteroids-textures (.asteroids self))
for pos across (soa-asteroids-positions (.asteroids self))
for scaled across (soa-asteroids-scales (.asteroids self))
when pos
do
(lgame.rect:with-rect (r (pair-x pos)
(pair-y pos)
(pair-x scaled)
(pair-y scaled))
(sdl2:render-copy lgame:*renderer* image :dest-rect r))))
If you're looking at the repo, just comment out the line in the init for making the asteroid-field instance and uncomment the line below that makes the asteroid-field-soa instance instead.
The speedup here is less impressive: 460 FPS for the 5k asteroids case (up from 450), and still only around 120 FPS with the 20k case.
Why is this? I think it's because I made a naive struct of arrays of structs (pairs) -- each element of the array is a pair struct, but I suspect this is a pointer and not contiguous in memory with the array itself. I didn't explicitly type things either, and SBCL's inference isn't perfect. In short, too much pointer chasing still going on, so no real benefit. If you instead made each array an explicitly typed SBCL array of 64-bit "floats", and use the top 32 bits of each float for the x and the bottom 32 for the y of each pair, then you would probably see a good speedup. I'll leave that as an exercise to somebody else, or to myself in the future if I get bored. But this is an advantage of an ECS framework if it does most of this for you; I find all the explicit typing of data types in the ecs-tutorial outside of the library innards rather ugly myself, but obviously it helps to improve performance.
At last, I've shown more agreement with point 2 from the ECS side: it's harder than you think (especially in Lisp when it comes to collections of lots of data) to refactor for better performance. I also think this structure of arrays is harder to read and reason about, and probably modify in the future for new features, despite being almost the same as the array of structs version. A lot is in that "almost".
Conclusion
After all that I still don't have much more to conclude than the title of this post: You might not need ECS. It's just another approach to doing things, not necessarily the best for your particular game.
If you've bothered to read all this, thank you but why lol. In any case, you're awesome, even if you want to fight me in the comments.
Posted on 2025-03-15 by Jach
Tags: games, lisp, programming
Permalink: https://www.thejach.com/view/id/443
Trackback URL: https://www.thejach.com/view/2025/3/you_might_not_need_ecs
Recent Posts
2025-03-15
2025-03-03
2025-02-13
2025-01-14
2025-01-10