Co-Director of Equilibria Network: https://eq-network.org/
I try to write as if I were having a conversation with you in person.
I would like to claim that my current safety beliefs are a mix between Paul Christiano’s, Andrew Critch’s and Def/Acc.
Jonas Hallgren 🔸
Are you building these things on ATProtocol (Bluesky) or where are you building it right now? I feel like there’s quite a nice movement happening there with some specific tools for this sort of thing. (I’m curious because I’m also trying to build some stuff more on the deeper programming level but I’m currently focusing on open-source bridging and recommendation algorithms like pol.is but for science and it would be interesting to know where other people are building things.)
If you don’t know about the ATProtocol gang, some things I enjoy here are:
- https://semble.so/
- Paper Skygest: https://bsky.app/profile/paper-feed.bsky.social/feed/preprintdigest
- (Feed on bluesky): https://bsky.app/profile/paper-feed.bsky.social/feed/preprintdigest
- AT Protocol: https://docs.bsky.app/docs/advanced-guides/atproto
A Loving Kindness Practice for EAs
Firstly, that is if you think that it isn’t inevitable and that it is possible to stop or slow down, if nuclear was going to be developed anyway, that changes the calculus. Even if that is the case there’s also this weird thing within human psychology where if you can point out a positive vision of something, it is often easier for people to kind of get it?
“Don’t do this thing” is often a lot worse than saying something like, could you do this specific thing instead when it comes to convincing people of things. This is also true for specific therapeutic techniques like the perfect day exercise and from a predictive processing perspective this is because you’re kind of anchoring your expectations around something better and it enables you to visualise things that are easier to take actions towards? You have an easier time seeing what actions that you actually have to take?
Finally, this is not likely what the underlying reasoning for why Will is doing something like the positive vision as that is more likely to be about the estimated value from improving the future versus reducing existential risk (see the following post).
I would very much be curious about mechanisms for the first point you mentioned!
For 11, I would give a little bit of pushback related to your building as a sport team metaphor as I find them a bit discongruant with each other?
Or rather the degree of growth mindset that is implied in 11th seems quite bad based on best practices within things like sport psychology and general psychology? The existent frame is like you’re either elite or you’re not gonna make it. I would want the frame to be like “it’s really hard to become a great football player but if you put your full effort into it and give it your all, you consistently show up and put in the effort then you might make it”.
I work within a specific sub-part of Cooperative AI that is quite demanding in various ways and it’s only like 1 in 10 or 20 people who really get it from a existent pool of people who already understand related areas. Yet I’ve really got no clue who it will be and the best way to figure it out is to give my time and effort to anyone who wants to try. Of course there is an underlying resource prioritization going on but it is a lot more like a growing sports team than not?
Cancer; A Crime Story (and other tales of optimization gone wrong)
Firstly, great post thanks for writing it!
Secondly, with regards to the quantification section:
Putting numbers on the qualities people have feels pretty gross, which is probably why using quantification in hiring is rather polarising. On the one hand, there’s some line of thinking that the different ways in which people are well and ill suited to particular roles isn’t quantifiable and if you try to quantify it you’ll just be introducing bias. On the other hand, people in favour of quantification tend to strongly recommend that you stick exactly to the ranking your weightings produced.
I just wanted to mention something that I’ve been experimenting a bit with lately that I think has worked reasonably well when it comes to this? One of the problems here is the overindexing on the numbers that you assign to people and taking the numbers too seriously. A way to go around taking things to seriously is play and we did an experiment where we took this seriously.
When we took mentees into our latest research program we divided people up into different D&D Classes such as “wizard”, “paladin” and “engineer” based on their profiles. You’re not going to be able to make a decision fully based on the experience level someone has as a “paladin”, yet you’re not going to feel bad using the information.
I imagine it can be a bit hard to implement in an existing organisation but I do think this degree of playfulness opens up a safety in talking about hiring decisions that wasn’t there before. So I’ll likely continue to use this system.
I’ll post the list of classes below as well as how to evaluate their level from 1-10 if anyone is interested (you can also multi-class and experience is within a class):
Tank - Can take a bunch of work and get things done
Healer - Helps keep the team on track with excellent people management
Paladin—A leader that can heal but also take on a bunch of the operational work—generalist
Sorcerer—Communicator & creative that can magic things out into the real world intuitively
Bard—A communicator that has experience with talking with external stakeholders & writing beautiful prose about the work
Engineer—Technical person who can make all the technical stuff happen
Wizard—Organised researcher with deep knowledge in fields that can create foundational work
Diplomat—Understanding institutional design and governance structures and crafting policies and frameworks that enable coordinationLevels:
1 - Hasn’t slain rats yet—no experience
3 - Finished the sewer level—Finished undergrad + initial project in AI Safety
5 - Can fight wolves relatively well—Done with PhD + initial knowledge in AI Safety
7 - When you’re slaying an epic monster you want this person in your team—Experience with taking responsibility in difficult domains
9 - Could probably slay a dragon if they try—Wooow, this person is like so cool, god damn.
10 - Legendary expert—possible one of the best people in their field
Very very well put.
I became quite emotional when reading this because I resonated with it quite strongly. I’ve been in some longer retreats practicing the teachings in Seeing That Frees and I’ve noticed the connections between EA and Rob Burbea’s way of seeing things but I haven’t been able to express it well.
I think that there’s a very beauitful deepening of a seeing of non-self when acting impartialy. One of the things that I really like about applying this to EA is that you often don’t see the outcomes of your actions. This is often seen as a bad thing but from a vipassyana perspective this also somehow gets rid of the near enemy of loving kindness in purpose of getting something back. So it is almost like loving kindness based on EA principles is somehow less clinging than existing loving kindness practices?
I love the focus on the cultiavation of positive mental states as a foundation for doing effective work as well. Beautifully put, maybe one of my favourite forum posts of all time, thank you for writing this.
The question that is on every single EAs mind is, of course, what about huel or meal replacements? I’ve been doing huel+supplements for a while now instead of meat and I want to know if you believe this to be suboptimal and if so to what extent? Nutrition is annoyingly complex and so all I know for sure is like protein=good, cal in=cal out and minimize sugar (as well as some other things) and huel seems to tick all the boxes? I’m probably missing something but I don’t know what so if you have an answer, please enlighten me!
This one hit close to home (pun not intended).
I’ve been thinking about this choice for a while now. There’s the obvious network and work benefits in living in an EA Hub yet in my experience there’s also the benefit of a slower pace leading to more time to think and reflect and develop my own writing and opinions on things which is easier to get when not in a hub.
Yet in AI safety (where I work) all of the stuff is happening in the Bay and London and mostly the Bay. For the last 3 years people have constantly been telling me “Come to the Bay, bro. It will be worth it, everything is happening here”. So there’s a lot of FOMO and also literal missing out involved in this decision.
I had been thinking that I would delay this decision until later but like 6 of your 9 criteria are fulfilled for me and I find that it feels more value aligned and that it might also be smart to plan with this in mind from an earlier age. (I’m 23 from Sweden)
So I’m leaning on Sweden as a home base and to visit the other places for conferences and work, maybe some longer work stances but generally living in Sweden and having it as a base.
It feels a bit drastic (and we’ll see if this holds) but it kind of feels like you helped me resolve one of my larger questions in life so thanks? :D
Uncertain risk. AI infrastructure seems really expensive. I need to actually do the math here (and I haven’t! hence this is uncertain) but do we really expect growth on trend given the cost of this buildout in both chips and energy? Can someone really careful please look at this?
https://www.lesswrong.com/users/vladimir_nesov ← Got a bunch of stuff on energy calculations and similar required for AI companies, especially the 2028 post, some very good analysis of these things imo.
I think it is a bit like the studies on what makes people able to handle adversity well, it’s partly about preparation and ensuring that the priors people bring into the systems are equipped to handle the new attack vectors that this transition provides to our collective epistemics.
So I think we need to create some shared sources of trust that everyone can agree on and establish those before the TAI transition if we want things to go well.
A Phylogeny of Agents
I’m curious about the link that goes to AI-enabled coups and it isn’t working, could you perhaps relink it?
Besides the point that “shoddy toy models” might be emotionally charged, I just want to point out that accelerating progress majorly increases variance and unknown unknowns? The higher energy a system is and the more variables you have the more chaotic it becomes. So maybe an answer is that a agile short-range model is the best? Outside view it in moderation and plan with the next few years being quite difficult to predict?
You don’t really need another model to disprove an existing one, you might as well point out that we don’t know and that is okay too.
Yeah, I think you’re right and I also believe that it can be a both and?
You can have a general non-profit board and at the same time have a form of representative democracy going on which seems the best we can currently do for this?
I think it is fundamentally about a more timeless trade-off between hierarchical organisations that generally are able to act with more “commander’s intent” versus democratic models that are more of a flat voting model. The democratic models suffer when there is a lot of single person linear thinking involved but do well at providing direct information for what people care about whilst the inverse is true for the hierarchical one and the project of good governance is to some extent somewhere in between.
Yeah for sure, I think the devil might be in the details here around how things are run and what the purpose of the national organisation is. Since Sweden and Norway have 8x less of a population than germany I think the effect of a “nation-wide group” might be different?
In my experience, I’ve found that EA Sweden focuses on and provides a lot of the things that you listed so I would be very curious to hear what the difference between a local and national organisation would be? Is there a difference in the dynamics of them being motivated to sustain themselves because of the scale?
You probably have a lot more experience than me in this so it would be very interesting to hear!
I like that decomposition.
There’s something about a prior on having democratic decision making as part of this because it allows for better community engagement usually? Representation often leads to feelings of inclusion and whilst I’ve only dabbled in the sociology here it seems like the option of saying no is quite important for members to feel heard?
My guess would be that the main pros of having democratic deliberation doesn’t come from when the going is normal but rather as a resillience mechanism? Democracies tend to react late to major changes and not change path often but when they do they do it properly? (I think this statement is true but it might as well be a cultural myth that I’ve heard in the social choice adjacent community.)
I think I went through it in Spring 2021? I remember discussing it then as one of the advanced optional topics, maybe around steering versus rowing and that the discussion went into that? I can’t remember it more clearly than that though.
First and foremost, I think the thoughts expressed here make sense and this comment is more just expressing a different perspective, not necessarily disagreeing.
I wanted to bring up an existing framework for thinking about this from Raghuram Rajan’s “The Third Pillar,” which provides economic arguments for why local communities matter even when they’re less “efficient” than centralized alternatives.The core economic benefits of local community structures include:
Information advantages: Local groups understand context that centralized organizations miss
Adaptation capacity: They can respond quickly to local opportunities and constraints
Social capital generation: They create trust networks that enable coordination
Motivation infrastructure: They provide ongoing support that sustains long-term engagement
So when you bring up the question of efficiency and adherence to optimal reflective practices I start thinking about it from a more systemic perspective.
Here’s a question that comes to mind: if local EA communities make people 3x more motivated to pursue high-impact careers, or make it much easier for newcomers to engage with EA ideas, then even if these local groups are only operating at 75% efficiency compared to some theoretical global optimum, you still get significant net benefit.
I think this becomes a governance design problem rather than a simple efficiency question. The real challenge is building local communities that capture these motivational benefits while maintaining mechanisms for critical self-evaluation. (Which I think happens through impact evaluations and similar at least in EA Sweden.)
I disagree with the pure globalization solution here. From a broader macroeconomic perspective, we’ve seen repeatedly that dismantling local institutions in favor of “more efficient” centralized alternatives often destroys valuable social infrastructure that’s hard to rebuild. The national EA model might be preserving something important that pure optimization would eliminate.
I enjoyed reading this and yet I find that in the practice of higher ambition there are some specific pitfalls that I still haven’t figured out my way around.
If you’ve every worked a 60-70 hour work week and done it for a longer period of time, you can notice a narrowing characteristic of experience, it is as if you have blinders to what is not within your stated goals or the project you’re working on. (I like to call this compression) With some of my more ambitious friends who do this more often, I find that they sometimes get lost in what they’re working on. It is as if their precision of their cause area goes down and suddenly they’re working on something that is a lot less effective in absolute terms (due to the difficulty of finding a good effective target) and they’re carried by the local incentive gradients.
So they end up in lab automation instead of aging, developing specific medicines instead of working whole brain emulation or end up starting a SaaS multi-agent project instead of working on the safety of multi-agent systems.
I can’t help but feel that in the search of ambition, it is very easy to let go of your grounded foundation and that the pull of ambition once you’ve started can easily carry you away from where you want to be, power changes people and all that.
I was thinking about how to round off this comment and I think my own theory about how to solve it might be interesting. I think this generally has to do with sympathetic activation in your nervous system and that when you’re compressed this shows in cortisol levels, hrv and elevated heart rate during the night. This is at least my mapping on when I’ve become “compressed” in the past. So the theory and n = 1 experiment then says that if you can work on being really good at active recovery you can get around some of the cognitive narrowing. This might then mean that if you want to remain reflective while ambitious you might want to get good at active recovery, things such as meditation, exercise, walks, supplements and more can really help with it. You can also pretty easily track this for yourself by tracking your recovery metrics using some sort of tracker.