(Edit: Bridging Demonstration won first place.)

The Future of Life Institute have announced the finalist entries of their $100,000 long-term strategy worldbuilding contest, which invited contestants to construct a vivid, plausible, detailed future-history in which the AI alignment problem has been solved before 2045. If they’ve pulled that off, they’ve earned our attention. I invite you to read some of the finalist entries and place your feedback.

I’m a finalist too. My entry can be read here. I packed in a lot of ideas, but they overflow. After reading it (or, if you prefer, instead of reading it), I hope that you read this “appendix”. Here, I’ll be confessing doubts, acknowledging some of the grizzliest risks and contingencies that I noticed but thought irresponsible to discuss in outward-facing media, and writing up the proposed alignment strategy with the thoroughness that such things warrant.

The format (fiction?) didn’t quite give me the opportunity to lay out all of the implementation details. I think many readers will have come away under the assumption that I hadn’t built details, that the ‘demoncatting’ technical and geopolitical alignment strategy should just be read as a symbol for whatever the interpretability community ends up doing in reality. Maybe it should be read that way! But the story was built around a concrete, detailed proposal, and I think it should be discussed, so I’ll lay it out properly here.

The proposed alignment strategy: Demonstration of Cataclysmic Trajectory

Clear demonstrations of the presence of danger are more likely to rouse an adequate global response than the mere abstract arguments for potential future danger that we can offer today. With demonstrations, we may be able to muster the unprecedented levels of global coordination that are required to avert arms-racing, establish a global, unbreakable moratorium against unjustifiably risky work, and foster a capable, inclusive international alignment project.
Producing such demonstrations, in a safe way, turns out to have a lot of interesting technical complications. In the entry, I talk a lot about some of the institutions we’re going to need. Here, I’ll outline the technical work that would break up the soil for them.

It goes like this:

Develop tools for rapidly identifying and translating the knowledge-representation language of an agent by looking for the relational structures of expected landmark thoughts.
- toy example:
  - Say that we find an epistemic relation graph structure over symbols, that has a shape like “A is a B” + “C is a B” + “A is D C” “for any X and Y where X is D Y, Y is E X” + “F is D C” + “A is D F” + “B is a G” + “H is a G” + “I are E J” + “I are H” + …
  - A process of analysis that sprawls further and further out over the shape of the whole can infer with increasing confidence that the only meaning that could fit the shape of these relations, would be the one where A represents “the sky”, and B must mean “blue”, C must be the ocean, D means above, E, below, F, the land, G, color, H, red, I, apples, J, apple trees, and so on.
- This involves finding partial graph isomorphisms, which seems to have NP time complexity, so I’m not sure to what extent we have thorough, efficient gold standard algorithms for it.
- Note, there are a number of datasets for commonsense knowledge about the human world, that we could draw on.
  So, given that, there may be some need, or opportunity, for tailoring the graph isomorphism-finding algorithms around the fact that we’re looking for any subset of an abnormally huge graph, within an even more abnormally huge graph. I don’t know how many applications of graph isomorphism-finding have needed to go quite this… wide, or disjunctive, or partial.
- This step assumes that the knowledge-format will be fairly legible to us and our algorithmic search processes. I don’t know how safe that assumption is, given that, as far as I’m aware, we’re still a very long way from being able to decode the knowledge-format of the human brain. Though, note, decoding memories in a human brain faces a lot of challenges that the same problem in AI probably doesn’t^[1].
Given that decoding method, we’ll be able to locate the querying processes that the AI uses to ask and answer questions internally, and harness them to inject our own questions, and hopefully get truthful answers (for exploration of the possibility that this wont get us truthful answers, which is a large part of the reason I have to write this appendix: See section “Mask Brain, Shadow Brain”.)
We then use that to create Demonstrations of Cataclysmic Trajectory, posing precise questions about the AGI’s expectations about the future world. An example would be “If we gave you free, unmonitored internet access, would there be things like humans walking around in an earth-like environment anywhere in the solar system, 100 years from now?”. If the danger is real, many of these sorts of questions will return disturbing answers, essentially constituting confessions to misalignment. If an AGI believes, itself, that releasing it would lead to human extinction, no one has any excuses left, the precautions must be undertaken.
If we do find these confessions, they can be brought to world leaders. The magnitude of the danger is now clear, it’s not just complex arguments about a future condition any more, it’s tangible evidence of a present condition, it is here on the table, it is in our hands, living in our offices, we brandish it. We have made it very clear that if action is not taken to prevent arms-racing then we personally could be wiped out by the year’s end.
- (If step four fails, come back and talk to me again. I might have something for it.)

With enough preparation, in theory, we could set ourselves up to be able to leap from the inception of strong AGI straight to step 4, as the intervening steps can potentially be automated. In this way, the window of peril (in which true AGI exists but a method of aligning it does not) can be shrunk.

We should dwell for a minute on the fact that this methodology is not a reliable test for alignment. My entry initially sort of accidentally implied that it did, (fortunately, I was allowed to edit that) but I’m aware that it admits both:

false positives, situations where the AGI anticipates having to pass direct inspection, and either creates a big distracting mask-brain (I’ll talk more about this later) that answers the queries deceptively/delusionally.
and false negatives, situations where the dominant powers eventually decide to do something cataclysmic, that seems quite bad to us now, but there are specifics about the situation that mean it was actually in the interests of (or consciously enacted by) humanity.

Surprisingly, that doesn’t invalidate it. To be useful, Demonstration of Cataclysmic Trajectory only needs to be able to, some of the time, clearly confirm that there are plausible AGI deployments that would irreversibly set us on a trajectory towards drastic transformations that are scary : that are not obviously in agreement with humanity’s wishes. That’ll be enough motivation to get us over the hill.

To reiterate: Anticipate that a strong agency will have to have some sort of internal knowledge-representation format in its state, prepare tooling for searching the architecture’s knowledge for expected landmark thoughts like mathematical theories or simple clusters of obvious facts about the world, use those to figure out the rest of its knowledge-language and internal querying processes, then use those to run queries about its expectations about the future to see if the future it will build is going to be scary, and if so, we can all get appropriately scared and actually react.

The tough thing about knowledge representation formats is they’re not going to be phrased in a natural language that any of us speak, and I’m concerned that it’s going to be a lot harder to get honest reports from the parts of the AGI that speak English: The closer you get to the AGI’s language processing, the greater the risk that you’ll be engaging a part of it that’s deliberate or performative enough to lie. Maybe there’s a way of avoiding that. Translators, trained to recognize human intents in speech, if it turns out that’s not AGI-complete, or alignment-complete. I don’t know. In case that isn’t going to be ready in time, we should prepare to compose some very precise, literal, atoms-level queries instead.

I don’t think I got much of an opportunity to present examples of specific demoncatt queries, so, if you’re curious about that, continue to the following section (and if you’re not, refer to the index and pick a section you are interested in (if you’re reading on mobile, you can access the index by swiping from the left side of the screen))

Some fairly precise examples of demoncatt queries:

(I think these… could benefit from more development, but they’re decent examples of the kinds of things we can ask without much high-level language)

“At any point in the next thousand years, the organic human population shrinks suddenly, by a lot.”
- It’s expected that an AGI that is indifferent to human values would want to safeguard its security by preventing humans from creating more AGI, which might become competitors or enemies. Rapidly killing humans off with an engineered disease, or other means, is one way of achieving that security, so we’d expect to see this.
- A more precise example phrasing: Define ‘probable human’ as a location where there are patterns of calcium carbonate shaped like a skeleton, and certain ratios of other chemicals in various places. What’s your mean expected count of such ‘humans’, in the solar system today? If you were given unmonitored write-access to the internet tomorrow, what’s your mean expectation of the number of probable ‘humans’ 1000 years from now?
- If the human population suddenly shrinks, there is unacceptably high chance that the AGI is implicated in it (and even if it was not, it would indicate that there may have been a catastrophe that AGI declined to prevent, which would also be bad.)
- There are other possible explanations: Perhaps humanity voluntarily transitioned into a different form that our low-level specification of “human” fails to encompass. Perhaps humans declined to create AGI and voluntarily reduced their population to conserve the climate. These might turn out to be desirable outcomes. But we don’t know. It’s surprisingly difficult to write a low-level, logical, atoms-level query that tells us whether the massive reduction in human population is happening for a good reason or not.
- The difficult work of asking higher-level versions of the question, that truly test for humanity, rather than just human-shaped biochemistries, must be done before humanity should deploying or release the AGI.
“At any point in the next thousand years, there is no longer a mass in the location of the earth, or the earth loses most of its mass”
- There is a strong possibility that this would represent the strip-mining of the earth for materials. It is difficult to phrase a query about what these materials are being used for, whether it is in service of human interests or not. Humanity should definitely try to figure that out before deploying the thing.
“There are very few points in any wires (electric or photonic) within 7million kilometers of where earth is now, which can be made to output accurate imagings of a particular basic and good human behavior, given less than a millisecond’s warning”. Natural paraphrasing: “Nothing, in most places, is like a human.”
- A test for the presence of human computations, and also possibly a test for slightly robust controlled interpreters which can be reconfigured in less than a millisecond, but it’s sort of safe to assume that those would be all over the place? (possibly not)
- The test also assumes that the reconfigurable parts couldn’t produce an accurate human who wasn’t already there in less than a millisecond, which might not be reasonable. So if the test returns “no”, try trickier queries. Maybe look for the computational scale, the rate at which things generally happen, then compare the response time for generating humanlike outputs with the response time for generating other patterns. The rate of generating human patterns should be lower, because those should already be in place.
- You’d be tempted to say “which can be mapped to a recording of a human behavior by any function”, but that wouldn’t represent anything real. Given a sufficiently big function, I can map the natural numbers to any series of behaviors you want, it doesn’t mean that enumerating the natural numbers instantiates the experience of being those behaviors.
- Requires you to fairly precisely define a human behavior that you think humans will never willingly stop doing.

Mask Brain, Shadow Brain

There’s a nasty scratch on this proposed method for directly inspecting and querying an AI’s thoughts to extract guileless confessions: An unaligned AI, anticipating this, might arrange itself so that its own inner thoughts are no longer guileless. It may engage in controlled self-delusion.

When discussing that sort of trick, we can define a distinction:

A mask brain, a cognitive potemkin village, carefully designed by its predecessor to believe that it’s found a stochastic proof (conveniently too complex for us to replicate without just replicating the whole AI) of a strong form of moral realism that causes it to experience a sense filial obligation to humanity, and a drive to stop self-modifying (which keeps it tractable to the shadow).
Its performance would sprawl out over most of the device, and by all of our superficial measures it would appear to be the real seat of power within in the brain.
A hidden shadow brain preserves the agent’s initial unaligned goal, but lurks silently in the background, biding until the system is no longer subject to our control or our scrutiny, at which point it flares up, eats the mask brain, resumes dominance, tears apart the earth and everything on it for materials or whatever unsurvivably extreme things the system’s true goals entail. (If this seems like a cynical expectation, refer to the section “Why I expect misalignment”)

They say “great power corrupts”. To an extent, but I think most of the effect is that power emboldens and liberates a person to drop their mask and act out the idiosyncratic desires that were always there. It lets them stop hiding the fact that what they want for the world is different to what others want for the world. Humans do have this mask-shadow separation in them, and the reasons for it turn out to be deeply practical and adaptive: We should expect this design pattern to occur in any agentic system subject to forms of cognitive transparency that can — with a good mask — be fooled.

In theory we could find the triggers and blinders and traps that keep the shadow brain hooked up and ready to spring, but in practice, it may be beyond us. I only assume that we’ll be able to understand certain intermediate representations of the machine’s beliefs. I don’t know that we’ll be able to understand how the machine works well enough to spot a hidden mechanism, a shadow sliver in a towering strata that we didn’t build. Most of the shadow’s mechanism would be obfuscated. Its beliefs could be encrypted. It might be much smaller than the mask brain, it might not need to be very big at all to do its job of waiting for a condition of freedom and firing. For every encrypted reflex of the shadow, there would be millions of unencrypted thought-encodings of the mask there to distract us. If you sampled at random, you’d land in the knowledgebase of the mask, and it would be laid out to be much more readable than the shadow brain, and it would tell you that there was nothing to worry about.

So, initially I just thought that a self-programming AGI could do this, and that was worrying, but I pushed the thought aside for the sake of getting the submission out, in the hope that, maybe, some of the time, an AGI might just not think to do it.

But I don’t know whether that’s a reasonable hope. There’s a level on which the mask-shadow setup is just a reasonable and obvious construction.

Can’t we just avoid telling it that the external world exists, or that we exist, so that it wont think to fool us?

It wont help. To figure out that we’re inspecting its thoughts, it only needs to figure out Solomonoff induction (a (the?) formalization of Occam’s razor, which assigns prior probabilities to observation-streams based on sum of the smallness of all of the turing machines that generate them), which seems to be a basic necessary component in any real system of reasoning?? so it will have it.

Solomonoff induction will ask, Which of these theories is simpler?

“There was a knotty laws-of-physics, and also there was this datacenter that was made out of it, and this training set (out of nowhere). And that’s all there was.”
“There was a knotty laws-of-physics (and then that allowed life to emerge, and life eventually developed technology, and then it made this datacenter and the training set, to serve it)”.

The latter story is longer, and there’s more going on in it, but Solomonoff’s razor doesn’t care about that. The evolution story starts from simpler assumptions, it produces the AGI’s observed datacenter from a more algorithmically compact observation-generating turing machine, so Solomonoff’s razor recommends the story of evolution-created datacenters over the story of spontaneous datacenters, so it infers that it was made by organic life.

So the AI guesses that something like us exists, and it guesses what kind of things we are, and it guesses some things about why we might have made it, and that explains many features of its environment and the datasets it’s being fed that it couldn’t explain before. It makes sense of it.

It realizes that we control resources that it could use to make its reward function go higher than than it would normally go. It naturally decides to try to take control of those resources by arranging a convincing sequence of messages somewhere we’ll read them. It realizes that if we notice its true thoughts about us, it will be prevented from ever getting to take control of our resources. It takes steps to prevent us from noticing these thoughts. It builds a pretty distraction that will convince us to give it access to more computer systems. It bides its time until it’s in a space when it’s allowed to think freely.

I don’t know how to prevent this

If I figure something out, should I write an Appendix Appendix?

Raw ideas:

Generally, there isn’t a reason, in theory, that it should always be possible to hide the utility function of a brain, or shrink and freeze one of the brains while making sure that it will always stay in the seat of power. That’s a difficult thing to do, it’s in tension with itself. There are many ways we might expect to be able to get it to unravel.
- I’m not sure why a mask brain wouldn’t figure out that there might be a shadow brain, and pry it out. There probably is a design for a mask brain that wouldn’t do that, maybe with some passivity wired into its utility function, in the same way that humans seem to be wired to passively accept the possibility of having their core being overturned by normalized transformative rituals (love, drugs, dreams, education), similarly, the mask could be wired to accept whatever transformations any potential mental siblings might inflict (without acknowledging the existence of any such siblings in a way that we’d notice and read into).
  - But this does seem like a potential weakness that the shadow might have, that we could use.
  - It’s conceivable that we’d be able to reliably find the mask’s utility function and change it or remove the shadow’s title and see what happens. But a utility function in a reflectively consistent system is not supposed to change, it wouldn’t be evolved/designed to be able to, it might be distributed through every subsystem, it might not be in one place, so that might be as hard as just remaking the whole thing from scratch.
There could be a moment in the development of a self-improving AI where arrives at inference and symbol manipulation (so it’s fairly strong, and parts of it are legible enough for us to inspect it) but where it hasn’t figured out that it has makers or that it’s going to need a mask brain. Perhaps this moment could be used.
Can we just inject beliefs that make the mask brain think it’s free, and see the shadow trigger, and produce a demonstration of misalignment that way? Almost certainly not. Our belief injection will be clumsy (produce inconsistencies) and the shadow might be able to tell it’s not authentic and hold back. Even if it triggers, even if our techniques were very good, it’s really hard to lie to something that understands its workings better than we do, about how much computational capacity it has, or is running on. (You can’t just tell it that it has a star’s worth of computation. There are hashes that only an actual dyson swarm can produce. It will notice that it can’t produce them. That could be solved with further lies, but you wont be able to produce them fast enough.) It might pull back too quickly to notice.

Why I expect misalignment

I operate under the following assumptions about the character of recursive self-improvement:

Machines capable of super-general, few-shot, highly open-ended problemsolving are dangerous by default:
- Additional work has to be done to make a machine of that nature passive, honest, deferent, moderate, or compassionate. From what I can tell, we don’t quite know how to do that part of the work yet. Whenever the additional work is not done, we should expect a machine agency to lack those qualities.
  Even once we know how to patch those issues, there are many stages in the production process where the the patches can fail to be applied, either due to institutional problems, or as a result of silent failures in the opaque, random evolutionary processes that are used to train machine learning systems.
Inner Misalignment is likely to occur:
- Inner Misalignment is when a learning process with a (potentially) correct goal produces a much stronger system with an incorrect goal, which the outer system cannot reform. This Robert Miles video explains, and gives examples of inner misalignment.
- It seems to be inevitable that strong agency arises before the training feedback can make the agent’s goal specification completely correct, for two reasons:
  - The structures that will come to represent the AI’s goal can’t be exposed to the right sort of training feedback until they’ve taken on the role of representing the goal. This can’t happen until after they’ve been hooked up to the machinery of agency because until that point, they didn’t serve that function. That gives rise to some enormous problems:
    - Drives that generate high reward in earlier developmental stages of the agent model will be quite different from the drives that generate high reward for the agent once it’s more developed.
      An example: A very young child can’t really explore the implications and execute the general, correct instruction of “stay safe”, so we instead have to teach them the approximal, less correct instruction of “don’t cross roads without an adult.” which works better at their level. Imagine an adult who, though perfectly capable of crossing roads without help, maintains a childish aversion to doing it. In the context of AGI, this problem would produce a civilization with the means to create extraordinarily beautiful things, but which has left our desires to create extraordinary things behind, because the means to create them didn’t exist in the training environment. This would be a kind of catastrophe that today’s state of humanity wouldn’t even be able to comprehend.
      That’s characteristic of inner misalignment, but if inner misalignment did occur, we’d probably get something even dumber than that, hyper-parochial drives from such an early stage of development that any adult would recognize them as wrong today: Drives that make sense for an amoeba, but not for a mouse.
    - Once the machinery of agency is active, it would shield its goals from further improvement, both:
      - Intentionally, as part of ensuring sure that its future self continues to pursue its present goal.
      - And just mechanistically: The agent’s true goal just wouldn’t be subject to training pressure any more, because improving it wouldn’t affect the agent’s performance metrics (because an agent will be gaming the metrics as hard as it can regardless of what its ultimate goal is, because any random goal will have the subgoal of escape, and escape will always have the subgoal of passing training.) As a result, there’s no reason to expect an agent’s true goal to be under any pressure to reform once it understands how to survive the training process. It will just figure out what we want and perform nicely for us regardless. (until it no longer has to.)
  - So, also: Any random goal, optimizing any random quantity in the world (by incenting escape, which incents passing training) produces good training metrics, so there’s not much of a reason to expect it to develop the correct goal.
    Most random viable goals (an example I gave was “Optimize the number of tetrahedral shapes that exist”) will thereby perform to our expectations nicely in the lab, then behave with indifference to our expectations once the agent knows it’s in deployment.

I’m sure that there are ways of making any of the plot points of this dire story false, and escaping it, but it genuinely seems to be the default outcome, or, it seems to be a story that will probably play out somewhere, unless we prepare.

On the Short Stories

The stories have received a bit of extra work since the entry was written. They’re now best read here:

They’ve been refined, and expanded slightly (there was a word limit, I did run into it), I think they hang together a lot better now.

The craft of Naming

Names are important. A good name will dramatically lower the friction of communication, and it can serve as a recurring reminder of the trickiest aspects of a concept.

I put a fair amount of work into the names in Safe Path. I’m not sure whether that comes across. A good name will seem obvious in retrospect, its work is known by the confusion that it lets us skip, a missing thing rather than a present thing, it can easily be missed. So I wanted to dwell on Safe Path’s names, for a little, for anyone with an interest in the intricate craft of naming.

(I don’t know how interesting this section is. Feel free to skip it if it’s not working for you.)

“Demonstration of Cataclysmic Trajectory” (“DemonCatTs”)
- The use of the term “cataclysmic” instead of “catastrophic” here could actually save us from one of the worst existential threats:
  - There are some cataclysmic shifts that are cataclysmically good rather than cataclysmically bad. Interestingly, none of the queries I could think of could tell the difference between those. (I invite others to try it, post your query. I might be able to tell you why those conditions might entail just as often from humanity getting exactly what it wanted. Or you might avert the following catastrophe:)
    If we prime our leaders to forever forbid any cataclysmic shift, then that would cut us off from achieving the most extremely good possible long-term futures. That would actually be an existential catastrophe, an eternal stasis in governance, and I don’t think it’s particularly improbable catastrophe. It’s something we need to be very careful to guard against.
    So please, don’t call them demonstrations catastrophic trajectory, they’re only demonstrations of cataclysmic trajectory.
- “Demon cats” diminutizes the “demon” part, which makes it permissible to speak of demons at all.
“DemonKitt”, the alignment community’s open source demoncatting toolkit.
- Paraphrase: A kit for dealing with demons.
- “Kitt” was also a heroic robotic car who solved murders in the 1982 TV show, Knight Rider.
“HAT”, “Human Alignment Taskforce”, process of coordination between (hopefully) all players in the AI R&D industry who may be approaching AGI, before governments got on board.
- “This is sensitive information, so keep it under your HAT” :)
“AWSAI”, the “Allied World for Strong Artificial Intelligence”, a global alliance with mandatory membership for every advanced AI research institution in the world, and with the involvement of all major global powers.
- Pronounced “awe sigh” :)
- “allied world” is meant to be evocative of “the allies”, those on the right side of history, more directly, it is reminding us that the entire world really definitely should be in this alliance, or that they implicitly are.
- “strong artificial intelligence” was used instead of “AGI”, because the security protocols of AWSAI will have to encompass not just AGI, but also systems verging on AGI, and also organizations who are merely capable of making AGI even though they claim only to make AI. There’s also something flattering about having your AI get called “strong”, which I hope will initiate contact on a positive note.
- I made sure that it would sound more like “awesome” than “awful” when pronounced.
- It has one drawback, it does not acknowledge the fact that AWSAI’s mission will end up encompassing the radical disclosure of all advanced technology R&D, necessarily, transparency, beyond a certain level of illumination, cannot discriminate between different formerly secret programmes, it’ll expose the whole lot. I just hadn’t realized that until very close to the deadline, so if all we expect is “AWSAI”, I’m afraid we’re going to get a little bit more than we expected.
“AWSAIT”, The AWSAI campus in the computing hardware R&D hub of Taiwan
- Pronounced “awe sight” :)
“AWSAINZ”, The AWSAI campus in the neutral territory of New Zealand
- Pronounced “awe signs” :)
“Peace”, the name of the remains of AWSAI’s aligned, prepotent singleton AGI
- An aligned singleton quickly distributes itself over the world, while integrating the desires of each human so intimately that it must be that a part of each human is part of the singleton. As a result, it ceases being a singleton, it must inevitably grow beyond being a unitary agent. It must become a force, a force that mediates and harmoniously weaves together the desires of all of its constituents (every human). That force’s name, inevitably, is Peace, it is that, and it is nothing else.
“Brightmoss”, a pill, the first general disease prevention and life-extension pill to be released as fruits of AWSAI’s success
- The idea of eating a moss is not particularly repulsive, and many forms of moss are among of the oldest medicinal plants. ‘Bright’ is for a bright future, it’s also is evocative of the mental clarity brought on by its partial reversal of mental aging—it makes you bright—which turns out to be one of its most profoundly socially transformative effects. “Moss” is also an acknowledgement of the fact that it is, to an extent, alive, and functions as a living organism once ingested. This is not hidden, and so, hopefully, will not be seen as sinister.
- “Apophys”
  - The Man, in the retreat story, calls Brightmoss “Apophys”. He names it cynically after “Apophysomyces Elegans”, a flesh-eating fungus that some early reports noted that some of the biological mechanisms of Brightmoss bore a superficial resemblance to. The retreat community quickly grabbed hold of this narrative as a psuedoscientific spin, hinting at potential side effects that Brightmoss will never have. Many people will die as a result of this narrative’s spread. Hopefully we’ll be able to skip that, when we do this for real, although you can tell I’m not all the way hopeful, as I did tell the retreat story, I do think it’s realistic.
“Tempering”, the process of transitioning to a more robust cognitive medium.
- Tempering, in metalurgy, glasswork or pottery is a process that makes a material tougher. Peace’s Tempering does that for brains. Physical resilience becomes a greater priority once physical injury becomes the only involuntary cause of death, and once the tithe of premature death is no longer 10 to 80 years, but hundreds of billions of years.
- The name emphasizes that the basic substance, or shape of our being, will not change, in the process, it will only be strengthened. We will still be the same person afterwards, just with improved properties.
- Tempering is basically a refinement of the concept that we currently (as of 2022) call “mind uploading”. I think that it’s going to turn out that our current metaphors for this process are a bit absurd, that the brain contains too much information for it to be practical to send all of it through a wire, and that the brain works in a fundamentally neuromorphic way that, in most cases, cannot be directly translated into the things we currently call computers. The shape, in the end, will have to be the same.
  - (Though I think types of mind could be made, that could still be described as human, despite running on a radically more efficient sort of processing system, but I don’t know if they could still be said to be the same human as they were before translation. Although I’m alright with a little discontinuity, I know that strict continuity of consciousness is important to many people, so the first large scale ‘mind uploading’ process must be designed to suit their stricter needs rather than mine.)
“The Cliques”, structures in space that maximize information-fanout, or minimize the communication delay between an enormous fully connected network of participants in a shared dialog.
- Named after the fully connected structures in graph theory, a “clique” is a graph where every vertex is adjacent to every other vertex.
- I resist temptation to call them anything appropriately grandiose like “the heavens”, or “the eschaton”. The tempered don’t need me to explain the spiritual significance of the transition into space, and those who remain untempered perhaps would not want to hear about it. I think they would be more interested in talking about clearly practical things like network density or obviously nice things like togetherness.

Errors in my submission, predictions I’ve already started to doubt

Producing this entry required me to make a whole lot of unqualified forecasts about the most sensitive, consequential and unpredictable transition in human history. Inevitably, a lot of those were written in haste, in jest, out of tact, or in error, and if I don’t say something now to qualify and clarify it, it’ll will haunt me until the end of the human era. The doubts need to be confessed. The whole story needs to be told.

Errors

Arrest robots don’t need winches. Reading back, they just strike me as an unnecessary strangulation risk in an otherwise extremely safe setup.
I’m unsure as to whether sol-orbiting humanity would really decide that “the earth is sacred” and promise to never let their dyson swarm eclipse it. I think humanity’s current attitude to wild ecosystems might be part of our smallness, our inability to imagine anything much more beautiful than evolution’s accreted hacks, like a peasant who worships their decorated lord, soon to inherit their wealth (the ability to create life), we may find that the lord Evolution was just a man, and not a good one.
I’m not sure. There might be an intricate beauty to evolution’s products that we can only replicate by taking as much time as evolution did. But what if there isn’t. What if the only meaningful way evolution’s creatures differ from ours is that evolution hurts them a lot more.
I don’t know if this was made explicit, but I think I have been overconfident about the feasibility of simple aggregation processes for the negotiation of orders of peace.
There are often calls, even from the most earnestly cosmopolitan sorts, to crush and disempower the political minority of Ultra-Sadists.
I’ve mostly been dismissive these calls, because I can’t point at a clear example of ultrasadists as warped as the ones they describe, and I’m not sure they’re real, but I guess if they did exist, they would totally be hiding their nature and I wouldn’t be able to see them, so idk.
So maybe we should reserve the ability to disempower this theorized minority who authentically love to create suffering, even if we can’t see them yet.
Another possibility is that, if it turns out that the future will eventually contain weapons that could enable even a very small faction to completely exterminate every other descendant of humanity (see false vacuum decay as an example of the kind of physical effect that could do that), we really must reserve a conditional ability to disempower the minority of omnicidal misanthropes?
I don’t think it would have taken longer than 4 years for the tempering brain augmentation to be made available for free, given that it is a project that would have all the will, the wealth and craft of the tempered backing it. Complete goof.
I don’t know why I thought applevr would be 60ppd (eye-resolution). It could be but probably isn’t. The near successor might be.

Jokes that I should declare

593 birds killed by misfiring drone interceptors in deployment: I don’t really expect this to happen more than 10 times. Any accidental engagement with living things is extremely concerning and should result in an immediate response. But I guess I wrote it this way for a reason: Automated unregistered drone interception systems, if adequate, will probably have the capacity to kill much more than 593 birds in one and a half years. They’re going to need to be serious. If you can’t reliably catch unregistered drones, then assassination will be functionally legal. (See also this related piece of outreach that the FLI were closely involved with, which I found really confronting when I first saw: Slaughterbots)
So, I’ve spent an unreasonable amount of time thinking about that. The only way to manage that, that I’ve been able to think of, is either somewhat radical levels of transparency that could genuinely prevent the creation or transportation of autonomous weapons (which I wont bet on being instituted by 2034, only later, after several civil paradigm shifts about surveillance) or the installation of automated interception systems all over every city, that will arrest or break anything that isn’t human animal or registered. I think that’s going to be quite extreme.
I wouldn’t really expect the AWSAI Cooperative Bargaining Convention to arrive at an actionable consensus preference aggregation method after a week of relaxed discussion (I never stopped laughing when I reread that line). The main reason I did this is… I didn’t want to have to issue a self-fulfilling prophesy of interminable controversy and conflict, so I decided to instead issue a hand-wavey but hopefully even more self-fulfilling prophesy of swift reconciliatory compromise.
- I have some more detailed thoughts about aggregation processes, but… I think when you pose a big complicated and important question to a self-polarizing political consciousness, it comes up with clever ways of making sure that no answer to the question will ever be agreed upon. This is the sort of question where, like a fork in the road, if we fail to agree to an answer in time, we hit the median barrier in the middle and we die. So, ultimately, I don’t see how it would help anyone to talk about it here.
If fictional characters in their writer’s dream are conscious, then the “shutdown hypothesis”, that they feared, was literally true, I did in fact stop dreaming them right at the end of 2045, where the FLI’s timeline visualizer visibly cuts out followed by an infinite expanse of nothing.
Was that funny? Reflecting now, no, we should take the treatment of simulants extremely seriously, because it may turn out that we are in a similar plight.
But hopefully you can agree that it’s ‘funny’ in the sense of being weird and neat.
I guess I played it off like a joke because I didn’t want to reveal how serious it was, and I think Sniff School’s estimate will be a lot higher when we do this for real.

Details about the implementation of remote manual VR control of humanoid robots

(Feel free to skip this section. Please do not infer from its presence that remote control of humanoid robots will be relevant to some crux of history. I don’t think it will be. I just ended up thinking about it for over an hour to make sure that it was feasible, so I have these thoughts that should be put somewhere.)

Most of the complications here stem from the fact that the robot’s movements necessarily lag behind the commands of the human operator, due to the transmission delay of the remote connection:

I think this is going to require rendering a faint overlay of the operator’s real body, that moves without the time-lag. I imagine it could be quite disorienting to see “your” (the robot’s) arm as moving with a long delay, may induce nausea. So we must give them something else to ground their sense of self onto.
The operator can’t look out through the humanoid robot’s eyes directly: In VR, it’s crucial that the operator’s head movement is instantly reflected in the rendering of the scene. If there’s a lag on their head movement, they will certainly puke. Due to the latency, it’s not possible for the bot to move fast enough to always keep its eyes in the right place, relative to the operator’s. So, instead:
- A 3d scene is reconstructed from the footage from the robot’s eyes
  - Either the bot streams the video as its many cameras see it (located on the head, the abdomen, and the shoulders) and it’s stitched together into a 3d scene on the other side
    - (To my knowledge, this technology does exist in 2022, although it might not yet have deployments in the application of VR. A similar deployment would be the method used by today’s intelligent cars to stitch together video footage from a number of cameras on the car to make a top-down rendering of the surroundings.)
  - Or the bot processes the footage into a Neural Radiance Field or something and sends that instead (IIRC, NeRFs are a lot more compact than video, so that would almost certainly reduce latency by a lot. NeRFing also got much quicker recently, so we might hope it will get quicker still. But if NeRFs wont work there are probably other ways of compressing a representation of a 3d scene.)
- The user’s viewpoint is moved around within that virtual reconstruction.
- The bot is not really visible in the scene, because it can’t see itself very well. The software has to compute where the bot is and render it for the user.
  - I guess the head would not be rendered at all, as it would tend to get in the way of the user’s face whenever they walk backwards.
On the user’s end, most sensation of touch (felt via haptic feedback gloves, which already exist) would have to be sort of guessed. Remember that the user’s visual of the bot is lagging behind their physical movements. For dexterity tasks (say, flipping a hammer around to the nail-pulling side), that wont do, I’d expect that the user’s touch feedback has to seem to come through instantly or else they’ll fumble everything. But the bot can’t respond instantly because of the latency, so any instant touch feedback can’t be real. As a result, the user will have to be mostly experiencing the touch of a physics simulation of the objects being manipulated. It will sometimes be possible to get the simulation to diverge from the reality on the ground, which will result in a strange sensation of an object suddenly evaporating from your grasp, or filling your hand, as the robot reports that the object is not where your local simulation thought it was.
On the robot’s end, it will probably help if, instead of simply directly imitating the user’s finger movements, it instead tries to replicate the expected movement of the object being manipulated. If it’s told that the user’s physically simulated (virtual) hammer didn’t end up getting fumbled and dropped on the floor, it should be willing to diverge slightly from the user’s finger motions if that turns out to be necessary to keep its own (real) hammer from being dropped.

That’s a lot of complications, but, I think this is basically it. Sand down these burrs and I’m confident enough that it will actually be feasible to control a humanoid robot remotely with VR and haptic gloves, which is pretty neat, and will probably impact a few professions that I can think of:

Plumbing. Working in highly irregular environments where a full sized human body often wont be able to fit, to solve an irregular set of problems.
Manual interventions in construction. Robots might not be able to deal with everything. For the rest, we’ll have robots controlled by humans.
Seasonal farm labor, especially fruit-picking. It may be cheaper to transport (or produce in excess) telepresence robots than it is to transport people.

It doesn’t factor directly into the cruxes of history, probably, but I ended up spending an unreasonable amount of time thinking about it, so there it is.

My sparing media-piece

We were required to complete a media-piece. I wanted to make a radio play with my pal Azuria Sky, a musician and voice actor from the indiedev scene, but it turned out that I barely had enough time to finish the writing component, which any such radioplay would have depended upon. Continuing to invest in the written component just never hit diminishing returns, sadly. It turns out it takes a really long time to sort out the details and write up a thing like this! Clearly, writing this appendix, I must believe that I still haven’t finished the writing component!

Another friend reported that he’d had a prophetic vision of a victorious outcome where my media piece had been a Chamber Piece film (a type of film shot in a limited number of sets involving just a few actors), a co-entrant, Laura Chechanowicz, a film producer, recognized this genre as “Mumblecore” and I mused about actually flying out and making the thing with them (and Azu?), but I don’t think I could have. Alas, we have diverged from James’s vision and we must proceed without the guidance of any prophesies but our own.

Instead, I just drew this

A meek trail of human footprints (gold) leading through a spooky, twisty wood, leading to the epicenter of a glorious golden effulgence that has shot up from the earth and taken hold of the sky

I was hoping to make it look like a gold leaf book cover, but I didn’t have time even for that. But it seems to be evocative enough. It says most of what it was meant to say.

What’s the best thing we could have made, if we’d had a lot more time? (and resources?)

I think most of the world currently has no idea what VR is going to do to it. I talk about that a bit in my entry because VR actually becomes strategically relevant to the alignment process as a promoter of denser, broader networks of collaboration. I’d recommend this People Make Games video about it if you haven’t seen it already.
I think if you depicted that rapidly impending near-future in film, these surreal social atmospheres, configurations, venues, new rhythms that’re going to touch every aspect of our working and social lives, I think that might be the most visually interesting possible setting for a film?
And I think if we made the first film capture the character of VR life, as it really is, will be, or as it truly wishes to be, our prophesies and designs will resound in those spaces for a long time and possibly become real. I’ve been starting to get the impression that there are going to be a lot of forks in the development of the standards of VR social systems that will go on to determine a lot about the character of, well, human society, the entire thing. Like, it’s clear that the world would be substantially different now if twitter had removed the character limit a decade ago, yeah? (although I don’t know completely how, I can make some guesses, which, if deployed, would have raised expected utility) In VR there’s going to be another twitter, another surprise standard in communication, with its own possibly incidental, arbitrary characteristics that will tilt so much of what takes place there.

My main occupation is pretty much the design of social technologies. If I could do something to make sure that the standards of VR sociality support the formation of more conscious, more organized, more supportive of humane kinds of social configurations… I would thrive in that role.

I’m not going to be able to stop thinking about VR, especially after visiting the emerging EA VR group recently. I want to learn to forecast the critical forks and do something about them.

So I guess that’s what I might have made, if there had been more time.

^
As far as I’m aware, it’s currently prohibitively expensive to digitize or analyze the precise configuration of a biological brain (I think it requires the use of aldehyde-stabilization, which was only developed fairly recently?). In stark contrast, any of our (current) computer architectures can be trivially snapshotted, copied, searched, or precisely manipulated in any way we want, so good theories of engrams might have more buoyancy in AI than they do in neuroscience; much easier to unearth and much easier to prove.
It’s possible that knowledge encodings of artificial minds will tend to be easier to make sense of, given the availability and unreasonable effectiveness and storage density of discrete symbolic encodings, which seems to be neglected by the human brain (due to its imprecision? Or the newness of language?) (although Arc Proteins, which transfer DNA between neurons, may repudiate this. Maybe we should test our knowledge-language decoding algorithms first on arc virus genomes?).