# Grabby Aliens and Paperclips

In a recent bar-talk-style chat with a friend, we got on the topic of AI risk again. Somewhere in the conversation he expressed surprise at some of my arguments, given some of my older arguments. Namely, I put forth some arguments de-emphasizing the importance of recent AI advances when it comes to full-blown AGI, and generally I expressed skepticism that full-blown AGI was "near". At least, a lot more skepticism than I've ever done before with him, it seems.

Ultimately I think I've just been more and more influenced by Robin Hanson's viewpoints, and his recent work on grabby aliens may have been another push that I've only connected now.

But stepping back a bit, I want to describe the shape of my beliefs about AGI. Before I get to probabilities I have to first talk about possibilities. So first of all, it seems obviously possible that creatures of human intelligence could one day create machine intelligence that rivals and surpasses them. It does not at all seem that it's an impossible engineering challenge in the way that a perpetual motion machine is one**, nor does it seem like humans should just happen to be close to the limits of general intelligence. If nothing else, we could at least think faster, which alone would be a large advantage even without greater generality to thought.

Before having an interesting conversation I think these possibilities must at least be granted even if not all believe them to be really possible. Some people for whatever odd reasons think such machine intelligence is impossible like perpetual motion machines, and it's hard to have a conversation with them of any interest, unless they can at least grant for the sake of conversation we assume it's possible and then try to rigorously think through the details and implications. (Much like Hanson's other work on UFOs. I don't believe UFO sightings have actually been signs of aliens, but if they are, how could that be? He weaves a very nice story about that.)

(** Of course our models of reality may be so far off that perpetual motion machines are possible somehow, or AGI impossible somehow, and under strict rationality we must assign epsilon chance to such things so that we don't break Bayes' theorem et al. with division by 0, but unless we're actively looking to explore the consequences (e.g. trying to make a somewhat hard sci-fi story with one piece of magic) it's reasonable to filter things out of the space of possibilities so that we don't need to assign a probability to them in an analysis. It's the same thing as ignoring for simplicity the chance of a fair coin landing on its edge instead of one of heads or tails, and ignoring the more epsilon-level 'chance' of the Gods of the Simulation deciding that both faces are now heads, or a box of only red and black balls suddenly containing green balls.)

Once AGI is accepted as possible and something humans may one day succeed in building (many already have stated their goal is to build such a thing), the question of timelines comes into play. When might it come about? I still maintain that my possibility space includes that the timeline could be as near as now, the moment you're reading these words, and have maintained that possibility since roughly shortly after being exposed to the set of ideas in my teens. That is, at any point since the late 2000s, I think it's been possible for AGI to have been developed. Why is that?

There are two questions I use to justify such a possibility. Is there enough hardware, enough computing silicon, around the world to support a general intelligence? In the 90s, probably not, but computers kept advancing, and even when they started to advance more slowly, they were still produced in greater number. If you added up the entire world's computation supply in the late 2000s, I estimated that yes, we had enough by then, probably, based on complexity arguments for digitized models of the human brain. Now that it's 2022, the only thing that has changed is that we have vastly more computation. So much more in fact that certain single individuals have access to enough, but small groups or teams at many companies or universities can get enough too. (This should make the second question have a bit more pessimistic edge to it, but it doesn't.)

The second question is: assuming we have enough compute -- put another way in case doubters are unsure what this means, if we randomly cycled through all possible computer memory states and tried executing them, in infinite time we'd find at least one memory layout that executes to an AGI with our finite hardware -- what gates the creation of such AGI now? It's an engineering problem, what are the engineering challenges beyond raw compute which itself is probably sufficiently met?

Of course, if I knew that, I could make my own AGI. But I don't, and it seems no one else does either, for now (as I'm hitting the keys for this text). Maybe some people out there are on the right track, or maybe not. Ultimately the sub-question here that I go back to is: if/when it happens, will it have been the result of a brilliant insight (or a few)? Or will it be the result of decades of iteration on a golden path, with lots of side-branches that died off, as most other progress? Will the progress be discontinuous or a series of lumpy steps? (And I'll leave off from the rest of the post the similar question that applies to the AGI itself, does its own progress and capability go FOOM or is it more of a slow takeoff?)

Now, as before, I think both are possible. It could happen soon at any time, or it could still be many decades (centuries?) away. But what has changed, and my friend perceived, is that I think it's less probable than I used to that AGI will be born from a discontinuous insight moment than it is to be from lumpy progress over time. Time enough, even, for alternative radical technological transforms to play out, like advanced nanotech or Ems or biological immortality or cryonics success, or some combination of these.

I don't know why exactly I have updated this way. Many others have gone the other way, seeing the "big" improvements with deep neural nets over the last decade, over the last five years, over the last year... as evidence for discontinuous capability gain. To pick on one of those, though, I see it differently.

Take the case of AlphaGo, which "came out of nowhere" and soundly (especially its immediate successors) defeated the best humans had to offer in Go.

Except it didn't "come out of nowhere". Many years back, the idea of monte-carlo tree search was developed as a way to dramatically cut down the search space for classic ply-by-ply AIs that in smaller games like chess can be quite successful as a dumb AI strategy that nevertheless can beat humans given enough computer power. Go AIs with MCTS were made and improved over the years, eventually reaching around 6dan level. Not quite able to consistently beat a master 9dan player, but still better than almost all human Go players. Then, on a parallel track, people were experimenting with deep learning methods on these games. Surprisingly they were very effective on their own. I would need to go check, but I believe they were very quickly capable of 4dan level play, at least the ones that were announced and tested. (Notably AlphaGo wasn't announced until it was announced, and was only tested against one pro a number of times during its development.) Anyway, lots of people paying casual attention noticed these two methods providing strong levels of play, and wondered how strong would an AI that combined the two be. Might it finally topple human dominance? And that's what AlphaGo was, a combo of MCTS and deep learning. And yes, humans are now categorically worse at Go than machines just like they became worse at Chess years ago.

A lot of worry is being made now of pure deep learning tricks getting us to AGI. Maybe, I'm skeptical, but we'll see. But it's also possible that an existing trick is known that, when combined with deep learning, could produce AGI, or at least something even more impressive than anything so far. We'll see.

In essence, the interval of my confidence for AGI being realized any time between now or in centuries hasn't changed, but the probability mass inside has changed its location to be heavier on the distant side. (It's not by some fixed amount either, it has actually shifted further away in time. Like, one might predict AGI "in the next 20 years", but then 20 years comes and goes. It's not that uncommon to still predict AGI "in the next 20 years" after. Essentially the prediction is really saying that there's some x% chance of some key insight being made, after which it's just a matter of routine effort. Since that insight could happen at any time, the sum of the next 20 years is equally likely as the sum of the span of time between 20 and 40 years from now.)

On a related note, the talk of grabby aliens maybe is a comfort against unFriendly AGI's chances. Or maybe not. But if the grabby aliens model is right, and we can expect the universe to fill up shortly with grabby aliens colliding with each other, at least one of those aliens may be a bad AGI unyieldingly optimizing for something, no? Optimizing AGIs would naturally be inclined to go Grabby if they could. So even if we don't fail the alignment problem here on Earth, we'll still have to deal with the failure of some other species' attempt at AGI sooner or later. Unless, of course, unFriendly AGI is actually just really hard, or entirely the wrong model to think about these things with.

I haven't put much more thought into this, so I'll end with a repetition, I just had the connection and realization that if unFriendly AGI is at all likely, and humans/ems survive the next few hundred billions of years, we'll have to deal with bad AGI made by others, along with other kinds of alien minds or their not-unFriendly AGI spawn. Really perhaps the Friendly/unFriendly/somehow-neither model of AGI minds isn't the most useful to focus on? Or at least it needs to be applied more rigorously to alien minds as well as human minds. I guess calling it the 'alignment problem' is a better approach to that. An AGI that is flawless in every way and values all good human things, except for not valuing human humor, would, if unchecked, probably contribute to making a universe without much humor in it. This would be sad and a failure of sorts, but it's not really the same as an unFriendly AGI that just uses our atoms for something else, and in the case of an alien AGI that couldn't have been made to value human humor but only values alien humor, is it really too much of a stretch to imagine the alien AGI could learn to understand and maybe change its values to appreciate human humor too? Nothing says it has to, but what would be more likely?

#### Posted on 2022-06-29 by Jach

LaTeX allowed in comments, use $\\...\\$\$ to wrap inline and $$...$$ to wrap blocks.