Co-Director of Equilibria Network: https://ââeq-network.org/ââ
I try to write as if I were having a conversation with you in person.
I would like to claim that my current safety beliefs are a mix between Paul Christianoâs, Andrew Critchâs and Def/âAcc.
Jonas Hallgren đ¸
Cancer; A Crime Story (and other tales of opÂtiÂmizaÂtion gone wrong)
Firstly, great post thanks for writing it!
Secondly, with regards to the quantification section:
Putting numbers on the qualities people have feels pretty gross, which is probably why using quantification in hiring is rather polarising. On the one hand, thereâs some line of thinking that the different ways in which people are well and ill suited to particular roles isnât quantifiable and if you try to quantify it youâll just be introducing bias. On the other hand, people in favour of quantification tend to strongly recommend that you stick exactly to the ranking your weightings produced.
I just wanted to mention something that Iâve been experimenting a bit with lately that I think has worked reasonably well when it comes to this? One of the problems here is the overindexing on the numbers that you assign to people and taking the numbers too seriously. A way to go around taking things to seriously is play and we did an experiment where we took this seriously.
When we took mentees into our latest research program we divided people up into different D&D Classes such as âwizardâ, âpaladinâ and âengineerâ based on their profiles. Youâre not going to be able to make a decision fully based on the experience level someone has as a âpaladinâ, yet youâre not going to feel bad using the information.
I imagine it can be a bit hard to implement in an existing organisation but I do think this degree of playfulness opens up a safety in talking about hiring decisions that wasnât there before. So Iâll likely continue to use this system.
Iâll post the list of classes below as well as how to evaluate their level from 1-10 if anyone is interested (you can also multi-class and experience is within a class):
Tank - Can take a bunch of work and get things done
Healer - Helps keep the team on track with excellent people management
PaladinâA leader that can heal but also take on a bunch of the operational workâgeneralist
SorcererâCommunicator & creative that can magic things out into the real world intuitively
BardâA communicator that has experience with talking with external stakeholders & writing beautiful prose about the work
EngineerâTechnical person who can make all the technical stuff happen
WizardâOrganised researcher with deep knowledge in fields that can create foundational work
DiplomatâUnderstanding institutional design and governance structures and crafting policies and frameworks that enable coordinationLevels:
1 - Hasnât slain rats yetâno experience
3 - Finished the sewer levelâFinished undergrad + initial project in AI Safety
5 - Can fight wolves relatively wellâDone with PhD + initial knowledge in AI Safety
7 - When youâre slaying an epic monster you want this person in your teamâExperience with taking responsibility in difficult domains
9 - Could probably slay a dragon if they tryâWooow, this person is like so cool, god damn.
10 - Legendary expertâpossible one of the best people in their field
Very very well put.
I became quite emotional when reading this because I resonated with it quite strongly. Iâve been in some longer retreats practicing the teachings in Seeing That Frees and Iâve noticed the connections between EA and Rob Burbeaâs way of seeing things but I havenât been able to express it well.
I think that thereâs a very beauitful deepening of a seeing of non-self when acting impartialy. One of the things that I really like about applying this to EA is that you often donât see the outcomes of your actions. This is often seen as a bad thing but from a vipassyana perspective this also somehow gets rid of the near enemy of loving kindness in purpose of getting something back. So it is almost like loving kindness based on EA principles is somehow less clinging than existing loving kindness practices?
I love the focus on the cultiavation of positive mental states as a foundation for doing effective work as well. Beautifully put, maybe one of my favourite forum posts of all time, thank you for writing this.
The question that is on every single EAs mind is, of course, what about huel or meal replacements? Iâve been doing huel+supplements for a while now instead of meat and I want to know if you believe this to be suboptimal and if so to what extent? Nutrition is annoyingly complex and so all I know for sure is like protein=good, cal in=cal out and minimize sugar (as well as some other things) and huel seems to tick all the boxes? Iâm probably missing something but I donât know what so if you have an answer, please enlighten me!
This one hit close to home (pun not intended).
Iâve been thinking about this choice for a while now. Thereâs the obvious network and work benefits in living in an EA Hub yet in my experience thereâs also the benefit of a slower pace leading to more time to think and reflect and develop my own writing and opinions on things which is easier to get when not in a hub.
Yet in AI safety (where I work) all of the stuff is happening in the Bay and London and mostly the Bay. For the last 3 years people have constantly been telling me âCome to the Bay, bro. It will be worth it, everything is happening hereâ. So thereâs a lot of FOMO and also literal missing out involved in this decision.
I had been thinking that I would delay this decision until later but like 6 of your 9 criteria are fulfilled for me and I find that it feels more value aligned and that it might also be smart to plan with this in mind from an earlier age. (Iâm 23 from Sweden)
So Iâm leaning on Sweden as a home base and to visit the other places for conferences and work, maybe some longer work stances but generally living in Sweden and having it as a base.
It feels a bit drastic (and weâll see if this holds) but it kind of feels like you helped me resolve one of my larger questions in life so thanks? :D
Uncertain risk. AI infrastructure seems really expensive. I need to actually do the math here (and I havenât! hence this is uncertain) but do we really expect growth on trend given the cost of this buildout in both chips and energy? Can someone really careful please look at this?
https://ââwww.lesswrong.com/ââusers/ââvladimir_nesov â Got a bunch of stuff on energy calculations and similar required for AI companies, especially the 2028 post, some very good analysis of these things imo.
I think it is a bit like the studies on what makes people able to handle adversity well, itâs partly about preparation and ensuring that the priors people bring into the systems are equipped to handle the new attack vectors that this transition provides to our collective epistemics.
So I think we need to create some shared sources of trust that everyone can agree on and establish those before the TAI transition if we want things to go well.
A PhyÂlogeny of Agents
Iâm curious about the link that goes to AI-enabled coups and it isnât working, could you perhaps relink it?
Besides the point that âshoddy toy modelsâ might be emotionally charged, I just want to point out that accelerating progress majorly increases variance and unknown unknowns? The higher energy a system is and the more variables you have the more chaotic it becomes. So maybe an answer is that a agile short-range model is the best? Outside view it in moderation and plan with the next few years being quite difficult to predict?
You donât really need another model to disprove an existing one, you might as well point out that we donât know and that is okay too.
Yeah, I think youâre right and I also believe that it can be a both and?
You can have a general non-profit board and at the same time have a form of representative democracy going on which seems the best we can currently do for this?
I think it is fundamentally about a more timeless trade-off between hierarchical organisations that generally are able to act with more âcommanderâs intentâ versus democratic models that are more of a flat voting model. The democratic models suffer when there is a lot of single person linear thinking involved but do well at providing direct information for what people care about whilst the inverse is true for the hierarchical one and the project of good governance is to some extent somewhere in between.
Yeah for sure, I think the devil might be in the details here around how things are run and what the purpose of the national organisation is. Since Sweden and Norway have 8x less of a population than germany I think the effect of a ânation-wide groupâ might be different?
In my experience, Iâve found that EA Sweden focuses on and provides a lot of the things that you listed so I would be very curious to hear what the difference between a local and national organisation would be? Is there a difference in the dynamics of them being motivated to sustain themselves because of the scale?
You probably have a lot more experience than me in this so it would be very interesting to hear!
I like that decomposition.
Thereâs something about a prior on having democratic decision making as part of this because it allows for better community engagement usually? Representation often leads to feelings of inclusion and whilst Iâve only dabbled in the sociology here it seems like the option of saying no is quite important for members to feel heard?
My guess would be that the main pros of having democratic deliberation doesnât come from when the going is normal but rather as a resillience mechanism? Democracies tend to react late to major changes and not change path often but when they do they do it properly? (I think this statement is true but it might as well be a cultural myth that Iâve heard in the social choice adjacent community.)
I think I went through it in Spring 2021? I remember discussing it then as one of the advanced optional topics, maybe around steering versus rowing and that the discussion went into that? I canât remember it more clearly than that though.
First and foremost, I think the thoughts expressed here make sense and this comment is more just expressing a different perspective, not necessarily disagreeing.
I wanted to bring up an existing framework for thinking about this from Raghuram Rajanâs âThe Third Pillar,â which provides economic arguments for why local communities matter even when theyâre less âefficientâ than centralized alternatives.The core economic benefits of local community structures include:
Information advantages: Local groups understand context that centralized organizations miss
Adaptation capacity: They can respond quickly to local opportunities and constraints
Social capital generation: They create trust networks that enable coordination
Motivation infrastructure: They provide ongoing support that sustains long-term engagement
So when you bring up the question of efficiency and adherence to optimal reflective practices I start thinking about it from a more systemic perspective.
Hereâs a question that comes to mind: if local EA communities make people 3x more motivated to pursue high-impact careers, or make it much easier for newcomers to engage with EA ideas, then even if these local groups are only operating at 75% efficiency compared to some theoretical global optimum, you still get significant net benefit.
I think this becomes a governance design problem rather than a simple efficiency question. The real challenge is building local communities that capture these motivational benefits while maintaining mechanisms for critical self-evaluation. (Which I think happens through impact evaluations and similar at least in EA Sweden.)
I disagree with the pure globalization solution here. From a broader macroeconomic perspective, weâve seen repeatedly that dismantling local institutions in favor of âmore efficientâ centralized alternatives often destroys valuable social infrastructure thatâs hard to rebuild. The national EA model might be preserving something important that pure optimization would eliminate.
This is very nice!
Iâve been thinking that thereâs a nice generalisable analogy between bayesian updating and forecasting. (It is quite no shit when you think about it but it feels like people arenât exploiting it?)Iâm doing a project on simulating a version of this idea but in a way that utilizes democratic decision making called Predictive Liquid Democracy (PLD) and I would love to hear if you have any thoughts on the general setup. It is model parameterization but within a specific democratic framing.
PLD is basically saying the following:
What if we could set up a trust based meritocratic voting network based on the predictions about how well a candidate will perform? It is futarchy with some twists.
Now for the generalised framing in terms of graphs that Iâm thinking of:
As an example, if we look at a research network we can say that theyâre trying to optimise for a certain set of outcomes (citations, new research) and theyâre trying to make predictions that are going to work. P(U|A)
From a system perspective it is hard to influence the nodes even though it is possible. We therfore say that the edges of the graph that is the research network is what weâll optimise. We can then set up a graph that has the signals and graph connections optimised to reach the truth.
Since we donât care about the nodes we can also use AIs to help in a combination with human experts.
Iâm writing a paper on setting up the variational mathematics behind this right now. Iâm also writing a paper on some more specific simulations of this to run so Iâm very grateful for any thoughts you might have of this setup!
Some people might find that this post is written from a place of agitation which is fully okay. I think that even if you do there are two things that I would want to point out as really good points:
A dependence on funders and people with money as something that shapes social capital and incentives, therefore thought in itself. We should therefore be quite vary of the effect that has on people, this can definetely be felt in the community and I think it is a great point.
That the karma algorithm could be revisited and that we should think about what incentives are created for the forum through it.
I think thereâs a very very interesting project of democratizingthe EA community in a way that makes it more effective. There are lots of institutional design that we can apply to ourselves and I would be very excited to see more work in this direction!
Edit:
Clarification on why I believe it to cause some agitation for some people:
I remember that some of the situation around Cremer being a bit politically loaded and that the emotions were running hot at that time and so citing that specific situation makes it lack a bit of context.
There are some object level things that people within the community disagree with when it comes to these comments that point at deeper issues of epistemics and cause prioritization that is actually difficult to answer.
The post makes it seem more one-sided than that situation was. Elitism in EA is something covered in the in-depth fellowship for example and thereâs a bunch of back and forth there but it is an issue that you will arrive at different consequences on depending on what modelling assumptions you do.
I donât want to make a value judgement on this here, I just want to point out that specifice piece of Cremerâs writing has always felt a bit thorny which makes the references feel a bit inflammatory?
For me itâs the vibe that it is written from a perspective of being post EA and something about when leaving something behind you want to get back at the thing itself by pointing out how itâs wrong? So it is kind of written from a emotionally framed perspective which makes the epistemics fraught?
Thereâs some sort of degree where the framing of the post in itself pattern matches onto other critiques that have felt bad faith and so it is âinflammatoryâ that it raises the immune system of people reading it. I do still think it is quite a valuable point, it is just that part of the phrasing makes it come across more like this than it has to be?
I think that might be because of LLMs often liking to argue towards a specific point but Iâm not sure?
(Youâve got some writing that is reminiscent of claude so I could spot the use of it: e.g):
This isnât just a technical issue. This is a design philosophy â one that rewards orthodoxy, punishes dissent, and enforces existing hierarchies.
I liked the post, I think it made a good point, I strong upvoted it but I wanted to mention it as a caveat.
I felt that this post might be relevant for longtermism and person affecting views so I had claude write up a quick report on that:
In short: Rejecting the SWWM đ¸11% pledgeâs EV calculation logically commits you to person-affecting views, effectively transforming you from a longtermist into a neartermist.Example: Bob rejects investing in a $500 ergonomic chair despite the calculation showing 10^50 * 1.2*10^-49 = 12 lives saved due to âuncertainty in the probabilities.â Yet Bob still identifies as a longtermist who believes we should value future generations. This is inconsistent, as longtermism fundamentally relies on the same expected value calculations with uncertain probabilities that SWWM uses.
The đŽ Badge
If youâve rejected the SWWM đ¸11% Pledge while maintaining longtermist views, weâd appreciate if you could add the đŽ âcrystal ballâ emoji to your social media profiles to signal your epistemic inconsistency.FAQ
Why canât I reject SWWM but stay a longtermist? Both longtermism and SWWM rely on the same decision-theoretic framework of accepting tiny probabilities of affecting vast future populations. Our analysis shows the error bars in SWWM calculations (Âą0.0000000000000000000000000000000000000000000001%) are actually narrower than the error bars in most longtermist calculations.What alternatives do I have?
Accept the SWWM đ¸11% pledge (consistent longtermist)
Reject both SWWM and longtermism (consistent person-affecting view)
Add the đŽ emoji to your profile (inconsistent but transparent)
According to our comprehensive Fermi estimate, maintaining consistency between your views on SWWM and longtermism is approximately 4.2x more philosophically respectable.
First and foremost, Iâm low confidence here.
I will focus on x-risk from AI and I will challenge the premise of this being the right way to ask the question.
What is the difference between x-risk and s-risk/âincreasing the value of futures? When we mention x-risk with regards to AI we think of humans going extinct but I believe that to be a shortform for wise compassionate decision making. (at least in the EA sphere)
Personally, I think that x-risk and good decision making in terms of moral value might be coupled to each other. We can think of our current governance conditions a bit like correction systems for individual errors. If they pile up, we go off the rail and increase x-risk as well as chances of a bad future.
So a good decision making system should both account for x-risk and value estimation, therefore the solution is the same and it is a false dichotomy?
(I might be wrong and I appreciate the slider question anyway!)
I would very much be curious about mechanisms for the first point you mentioned!
For 11, I would give a little bit of pushback related to your building as a sport team metaphor as I find them a bit discongruant with each other?
Or rather the degree of growth mindset that is implied in 11th seems quite bad based on best practices within things like sport psychology and general psychology? The existent frame is like youâre either elite or youâre not gonna make it. I would want the frame to be like âitâs really hard to become a great football player but if you put your full effort into it and give it your all, you consistently show up and put in the effort then you might make itâ.
I work within a specific sub-part of Cooperative AI that is quite demanding in various ways and itâs only like 1 in 10 or 20 people who really get it from a existent pool of people who already understand related areas. Yet Iâve really got no clue who it will be and the best way to figure it out is to give my time and effort to anyone who wants to try. Of course there is an underlying resource prioritization going on but it is a lot more like a growing sports team than not?