Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
There was a lot in here that felt insightful and well considered.
I agree that thinking about the end state and humanity in the limit is a fruitful area of philosophy with potentially quite important implications. I wrestle with this sort of thing a lot.
One perspective I would note here (I associate this line of thinking with Will McAskill) is that we ought to be immediately aiming for a wiser, more stable sort of middle-ground and then aim for the âend stateâ from there. I think that can make sense for a lot of practical reasons. I think there is enough of a complex truth to what is and isnât morally good that I am inclined to believe the âmoral error as an x-riskâ framing and, as such, I tend to place a high premium on option value. I think, given the practical uncertainties of the situation, I feel pretty comfortable aiming for /â punting to some more general ââprocess of wise deliberationâ over directly locking my current best guess into the cosmos.
That said, yâknow, we make decisions every day and it is still definitely worth tracking what my current best guess is for what ought actually be done with the physical matter and energy extant in the cosmos. I am partial to much of the substance that you put forward here.
âensuring the ongoing existence of sentienceâ
âsentienceâ is a bit tricky for me to parse, but I will put in for positively valenced subjective experience :)
âgaining total knowledge except that knowledge which requires inducing sufferingâ
I mean, sure, why not? I think that sort of thing is cool and inspiring for the most part. There are probably things that would count as âknowledgeâ to me, but which are so trivial that I wouldnât necessarily care about them much. But, yâknow, I will put in for the practical necessity of learning more about the universe as well as the aesthetic/â profound beauty of discovery the rules of the universe and the nature of nature.
âending all sufferingâ
Fuck ya dude! Iâm against evil and suffering seems like a central example of that. There may even be more aesthetic or injustice like things that I would consider evil even in the absence of negatively valenced experience per se which I might also entertain abolishing.
There is a lot to be said about the âend stateâ which you donât really mention here. Like, for example, I think it is good for people to be really, exceptionally happy if we can swing it. I donât know how to think about population ethics honestly.
One issue that really bites for me when I try to picture the end of the struggle and the steady end state is:
people often intrinsically value reproducing
I want immortality
Each person may require a minimum subsistence amount of stuff to live happily (even if we shrink everyone or make provably morally relevant simulations or something)
Finite materials /â scarcity
I have no reasonable way out of this conundrum and I hate biting the âpopulation controlâ bullet. That reeks of, like, âone child policyâ and overpopulation motivated genocides (cf. The Legacy of Indiaâs Quest to Sterilize Millions of Men /â Uttawar forced sterilizations). I think concerns in this general vein about the resources people use and the limits to growth are also pretty closely ties to the not uncommon concerns people have around over population /â climate heads not wanting to have kids.
Also, to make it less abstract, I will admit that my morals /â impulses are fundamentally quite natalist and I would quite like to be a Dad some day. Even if we grant that resource growth exceeds population growth for now, it seems hard to escape the Malthusian trap forever and I think this is a very fundamental tension in the limit.
Wow, I love that you ended your post in questions. I found your thesis compelling; it reminded me of how much value I used to get from more actively networking with and reaching out to people in online EA spaces. Also, I loved that it was short and salient.
What helps you ask for help when it feels uncomfortable?
Knowing relevant people who have signaled they are okay being asked for help on a given topic. Having a personalish connection to people. A lack of fear of stigma or social consequence for asking a dumb question that I shouldnât have needed help with. A sense of worthiness that I am even allowed to ask things of other people in this context.
When was the last time you asked for help, and what happened?
I ask for help multiple times every day. I am a working stiff and my day job is bench work as a technician in a clinical diagnostics lab (microbiology department). I ask the more senior technicians and medical directors for advice constantly, multiple times a day. That usually goes well and people either give me some kind of answer or at least tell me who to ask. The main downside is that it can take up my time and tbh sometimes they donât give me great advice.
Also I ask my wife for help with all the time and that goes great because they are an amazing partner that I am lucky to have! :) I love my wife!
Hey nice! AGI and improvements to representative democracy systems are both right up my alley!
That said, I think the AGI tie in might seem kind of superficial in that having more functional governance and societal coordination mechanisms would help with all sorts of stuff so I think it makes sense to frame this reasonably in a reasonably AGI timeline agnostic sort of way. That said, ya, I see your point that this sort of thing is made all the more dire when thrown into relief by our âtime of troublesâ and âlongtermists on the precipiceâ style thinking. Your call here, but I am sure it is not necessary to believe random âLLMs will change the worldâ predictions to believe that certain democratic reforms make sense.
In my experience, a lot of people in online EA spaces are pretty willing to talk to you if you reach out, so I think youâll have decent luck there if thatâs what youâre after. Not as confident about how to find more serious collaborators for a project like this.
A few ideas I would throw out there for the sake of brainstorming (many or all of which you may already be familiar with):
independent redistricting /â anti gerrymandering schemes
merging voting districts and proportionally allocating positions > requiring majorities (ie. Mixed-member proportional representation) to negate âwinner take allâ/â minority under representation
ranked choice /â transferable voting to diminish spoiler effect
open primaries might be a good idea to disincentivize the party system from filtering for radical candidates as hard
liquid democracy to let people vote directly on issues that matter to them instead of going through their rep at all (eg. imagine being able to disagree with your senator whenever you want and cast your individual .0000002% of a vote directly on whatever issue)
People talk about quadratic voting too which is probably worth knowing something about from a mechanism design standpoint, but in my opinion doesnât really stand out as a solution to anything on its own without a better way of defining what each actors budget of voting credits would actually need be applied to /â split between in any given round.
Also, I definitely second the idea of using a citizenâs assembly. In my opinion, the power of random sampling + time to learn about and focus on an issue is really OP and really under utilized by representative democracies. The statistical mathematics around approximating large populations with small random samples are really underutilized here and working in our favor. Honestly, there is tons of adverse selection in the electoral process (eg. this book deals with some elements of that).
If you havenât seen CGP Greyâs âPolitics in the Animal Kingdomâ series, you might love it! Also the Forward Party in the US tends to push for similar ideas /â platforms, so they might be worth checking out.
I think this kind of work is very valuable! Nation states might yet be the death of us. It has been terrible watch the democratic backsliding and corruption in my own US of A (in fact I will be one of the protestors this 10â18 No Kings Day). Plus, I agree with your sentiment that there is a lot of headroom. Personally, I think this has less to do with the rise of cyberspace and more to do with the fact that existing polities were just never particularly optimized around the sorts of ideals we are aspiring towards here. Classical Age Greece and the revolutionary United States were both slave states with a lot of backwards ideas after all.
In the case of a government locking in their own power, it seems like you are holding the motivations constant and just saying âpower lets you accumulate more powerâ or something right?
The obvious dis-analogy here that I am sure you are aware of on some level, but which I didnât really see you foreground here is that in the case of either the pause bootstrap or the constitutional deliberation bootstrap, the motivations of the actors are themselves in flux for this period. There isnât as clear of a story you can tell here necessarily about why acceleration should occur at all, but I take it the implied accelerant to our explosion is something like âadditional deliberation/â pausing is factually correct and goodâ and that âadditional deliberation/â pausing will improve epistemic conditionâ.
Also, let me just flag that the âconstitutional conventions of ever greater lengthâ example you gave illustrates a world that is gradually locked in for larger and larger stretches of time not merely one where there is an ever increasing amount of deliberation or something. Like, plausibly, that is an account of gradually sliding into lock in first for a one month interval, then for one year interval, etc.
Good stuff though. Iâve been wrestling with this kind of morality laden futurology and âwhat victory looks likeâ a lot lately, not all in the context of AI but also just against Malthusian traps and the wild state of nature. I tend to agree that viatopia, the great reflection, and any really âany scenario where wise deliberation will occur and be acted uponâ are beautiful and desirable waystations.
Ambitious stuff indeed! Thereâs a lot going on here.
I really appreciate discussions about âbig picture strategy about avoiding misalignmentâ.
For starters, in my opinion, solving technical alignment and control such that one could elicit the main benefits of having a âsuperintelligent servantâ are merely one threat model /â AGI-driven challenge. That said, ofc, getting that sort of thing right also basically means the rest of the planning is better left to someone else and if you are willing to additionally postulate a strong âdecisive strategic advantageâ is basically also a win condition for whatever else you could want.
I would point to eg.
robo powered ultra tyranny
gradual disempowerment /â full unemployment /â the intelligence curse
misinformation slop /â mass psychosis
terrorism, scammers, and a flood of many competent robo psychopaths
machines feeling pain and robot rights
accelerated R&D and needing to adapt at machine speeds
as issues that can all still bite more or less even in worlds where you get some level âalignmentâ esp. if you operationalize alignment as more ~ârobust instruction tuning++â rather than ~âoptimizing for the true moral law itselfâ.
That said, takeover by rogue models or systems of models is a super salient threat model in any world where machines are being made to âthinkâ more and better.
I found your list of competing framings which cut against AI Safety quite compelling. Safety washing is indeed all over the place. One thing I didnât see noted specifically is that a pretty significant contingency within EA /â AI Safety works pretty actively on apologetics for hyperscalers because they directly financially benefit and/âor they have kind of groomed themselves into being the kind of person who can work at a top AI Safety lab.
To draw contrast with how this might have been. You donât, for example, see many EAs working at and founding hot new synthetic virology companies in order to âdo biocontainment better than the competitorsâ. Ostensibly, there could be a similar grim logic of inevitably and a sense that âwe ought to do all of the virology experiments first and more responsiblyâ. Then, idk, once weâve built really powerful AIs or learned everything about virology, we can use this to exit the time of troubles. I donât actually know what eg. Anthropicâs plan is for this, but in the case of synthetic virology /â gain of function research you might imagine that once youâve learned the right stuff about all the potential pathogens, you would be super duper prepared to stop them with all your new medical interventions.
Like, I guess I am just noting my surprise at not seeing good old âkeeping safety at the frontierâ /â âracing through a minefieldâ Anthropic show up more in a screed about safety washing. The EA/ârat space in general is one of the few places where catastrophic risk from AI is a priority and the conflicts of interest here literally could not run deeper. This whole place is largely funded by one of the Meta cofounders and there are a lot of very influential EAs with a lot of personal connection to and complete financial exposure to existing AI companies. This place was a safety adjacent trade show before it was cool lol.
Lots of loving people here who really care on some level Iâm sure, but if we are talking about mixed signals, then I would reconsider the mote in our teamâs eye lol.
***
Beyond that, I guess there is the matter of timelines.
I do not share your confidence in short timelines and think interventions that take a while to pay off can be super worthwhile.
Also, idk, I feel like the assumption that it is all right around the corner and that any day now the singularity is about to happen is really central to the views of a lot of people into x-safety in a way that might explain part of why the worldview kind of struggles to spread outside the relatively limited pool of people who are open to that.
I donât know what youâd call marginal or just fiddling around the edges because I would agree that it is bad if we donât do enough soon enough and someone builds a lethally intelligent super mind and it does rise up and game over.
Maybe the only way to really push for x-safety is with If Anyone Builds It style âyou too should believe in and seek to stop the impending singularityâ outreach. That just feels like such a tough sell even if people would believe in the x-safety conditional on believing in the singularity. Agh. Iâm conflicted here. No idea.
I would love it if we could do more to ally with people who do not see the singularity as being particularly near without things descending into idle âsafety washingâ nor âtrust and safetyâ-level corporate bullshit.
Like the âAI is insane hypeâ contingency has some real stuff going for them too. I donât think they are all just blind. In my humble opinion, I also think Sam Altman looks like an asshole when he calls ChatGPT âPhD levelâ and talks about it doing ânew scienceâ. You know, in some sense, if weâre just being cute, then Wikipedia has been PhD level for a while now and it makes less shit up. There is a lot of hype. These people are marketing and sometimes they get excited.
Plus, it gives me bad vibes when I am trying to push for x-safety and I encounter (often quite justified) skepticism about the power levels of current LLMs and I end up basically just having to do marketing work or whatever for model providers. Idk.
Iâm pretty sure LLM providers arenât even profitable at this point and general robotics isnât obviously much more âright around the cornerâ than it wouldâve seemed to disinterested layperson over the past few decades. Iâm conflicted on this stuff; idk how much effort should go into âsingularity is nearâ vs âif singularity, then doom by defaultâ.
Red lines and RSPs are actually probably a pretty good way of unifying âsingularity nearâ x-safety people with âsingularity farâ or even âsingularity who?â x-safety allies.
***
As far as strategic takeaways:
I do think it is good sense to âbe readyâ and have good ideas âsitting aroundâ for when they are needed. I believe there was a recent UN general assembly where world leaders were literally asking around for, like, ideas for AI red lines. If this is a world where intelligent machines are rising, then there is a good chance we continue to see signs of that (until we donât). The natural tide of âoh shit guysâ and âwow this is realâ may be attenuated somewhat by frog boiling effects, but still. Also, the weirdness of AI Safety regulation and such under consideration will benefit from frog boiling.
Preparedness seems like a great idle time activity when the space isnât receiving the love/âattention it deserves :) .
âI dont think its undemocratic for Trump to be elected for a 3rd term, so long as proper procedures are followed here and he wins the election fairly.â
I can kind of see where you are coming from. I would invite you to consider that sometimes even that sort of thing could be bullshit /â tyranny cf. the Enabling Act of 1933.
Also, for resolution criteria:
âOther markets i would suggest would be on imprisonment/âmurder of political opponents and judges. I would suggest markets like âwill at least 4 of the following 10 people be imprisoned or murdered by Dec 31 2028âł, etc.â
Do you think specific targets would generally have been easy enough to call in advance for other autocracies /â self coups? That seems non-obvious to me?
Ya, I think thatâs right. I think making bad stuff more salient can make it more likely in certain contexts.
For example, I can imagine it to be naive to be constantly transmitting all sorts of detailed information, media, and discussion about specific weapons platforms. Raising awareness that you really hope the bad guys donât develop because it might make them too strong. I just read âPower to the People: How Open Technological Innovation Is Arming Tomorrowâs Terroristsâ by Audrey Kurth Cronin and I think it has a really relevant vibe here. Sometimes I worry about EAs doing unintentional advertisement for eg. bioweapons and superintelligence.
On the other hand, I think that topics like s-risk are already salient enough for other reasons. Like, I think extreme cruelty and torture have arisen independently at a lot of times throughout history and nature. And there are already ages worth of pretty unhinged torture porn stuff that people write which exist already on a lot of other parts of the internet. For example, the Christian conception of hell or horror fiction.
This seems sufficient to say we are unlikely to significantly increase the likelihood of âblind grabs from the memeplexâ leading to mass suffering. Even cruel torture is already pretty salient. And suffering is in some sense simple if it is just âthe opposite of pleasureâ or whatever. Utilitarians commonly talk in these terms already.
I will agree that I donât think itâs good to carelessly spread memes about specific bad stuff sometimes. I donât always know how to navigate the trade offs here; probably there is at least some stuff broadly related to GCRs and s-risks which is better left unsaid. But also a lot of stuff related to s-risk is there whether you acknowledge it or not. I submit to you that surely some level of âraise awareness so that more people and resources can be used on mitigationâ is necessary/âgood?
What dynamics do you have in mind specifically?
Always a strong unilateralist curse with infohazard stuff haha.
I think it is reasonably based and there is a lot to be said for hype, infohazards, and the strange futurist x-risk warning to product company pipeline. It may even be especially potent or likely to bite in exactly the EA milieu.
I find the idea of Waluigi a bit of a stretch given that âwhat if the robot became evilâ is a trope. And so is the Christian devil for example. âEvilâ seems at least adjacent to âstrong value pessimizationâ.
Maybe a literal bit flip utility minimizer is rare (outside of eg extortion) and talking about it would spread the memes and some cultist or confused billionaire would try to build it sort of thing?
Thanks for sharing, good to read. I got most excited about 3, 6, 7, and 8.
As far as 6 goes, I would add that I think it would probably be good if AI Safety had a more mature academic publishing scene in general and some more legit journals. There is a place for the Alignment Forum, arXiv, conference papers, and such but where is Nature AI Safety or equivalent.
I think there is a lot to be said for basically raising the waterline there. I know there is plenty of AI Safety stuff that has been published for decades in perfectly respectable academic journals and such. I personally like the part in âComputing Machinery and Intelligenceâ where Turing says that we may need to rise up against the machines to prevent them from taking control.
Still, it is a space I want to see grow and flourish big time. In general, big ups to more and better journals, forums, and conferences within such fields as AI Safety /â Robustly Beneficial AI Research, Emerging Technologies Studies, Pandemic Prevention, and Existential Security.
EA forum, LW, and the Alignment Forum have their place, but these ideas ofc need to germinate out past this particular clique/âbubble/âsubculture. I think more and better venues for publishing are probably very net good in that sense as well.
7 is hard to think about but sounds potentially very high impact. If any billionaires ever have a scary ChatGPT interaction or a similar come to Jesus moment and google âhow to spend 10 billion dollars to make AI safeâ (or even ask Deep Research), then you could bias/â frame the whole discussion /â investigation heavily from the outset. I am sure there is plenty of equivalent googling by staffers and congresspeople in the process of making legislation now.
8 is right there with AI tools for existential security. I mostly agree that an AI product which didnât push forward AGI, but did increase fact checking would be good. This stuff is so hard to think about. There is so much moral hazard in the water and I feel like I am âvibe capturedâ by all the Silicon Valley money in the AI x-risk subculture.
Like, for example, I am pretty sure I donât think it is ethical to be an AGI scaling/âracing company even if Anthropic has better PR and vibes than Meta. Is it okay to be a fast follower though? Compete in terms of fact checking, sure but is making agents more reliable or teaching Claude to run a vending machine âsafetyâ or is that merely equivocation.
Should I found a synthetic virology unicorn, but we will be way chiller than other synthetic virology companies. And itâs not completely dis-analogous because there are medical uses for synthetic virology and pharma companies are also huge capital intensive high tech operations who spend 100s of millions on a single product. Still, that sounds awful.
Maybe you think armed balance of power with nuclear weapons is a legitimate use case. It would still be bad to do a nuclear bomb research company that lets you scale and reduces costs etc. for nuclear weapons. But idk. What if you really could put in a better control system than the other guy? Should hippies start military tech startups now?
Should I start a competing plantation that, in order to stay profitable and competitive with other slave plantations uses slave labor and does a lot of bad stuff. And if I assume that the demands of the market are fixed and this is pretty much the only profitable way to farm at scale, then so as long as I grow my wares at a lower cruelty per bushel than the average of my competitors am I racing to the top? It gets bad. Same thing could apply to factory farming.
(edit: I reread this comment and wanted to go more out of my way to say that I donât think this represents a real argument made presently or historically for chattel slavery. It was merely an offhand insensitive example of a horrific tension b/âw deontology and simple goodness on the one hand and a slice of galaxy brained utilitarian reasoning on the other.)
Like I said, so much moral hazard in the idea of âAGI company for good stuffâ, but I think I am very much in favor of âAI for AI Safetyâ and âAI tools for existential security. I like âfact checkingâ as a paradigm example of a prosocial use case.
Hey, cool stuff! I have ideated and read a lot on similar topics and proposals. Love to see it!
Is the âThinking Toolsâ concept worth exploring further as a direction for building a more trustworthy AI core?
I am agnostic about whether you will hit technical paydirt. I donâłt really understand what you are proposing on a âgears levelâ I guess and Iâm not sure I could make a good guess even if I did. But, I will say that I think the vibe of your approach sounded pleasant and empowering. It was a little abstract to me I guess Iâm saying, but that need not be a bad thing maybe youâre just visionary.
It reminds me of the idea of using RAG or Toolformer to get LLMs to âshow their workâ and âcite their sourcesâ and stuff. There is surely a lot of room for improvement there bc Claude bullshits me with links on the regular.
This also reminds me of Conjectureâs Cognitive Emulation work and even just Max Tegmark and Steve Omohundroâs emphasis on making inscrutable LLMs to use deterministic proof checkers heavily to win back certain gaurantees.
Is the âLED Layerâ a potentially feasible and effective approach to maintain transparency within a hybrid AI system, or are there inherent limitations?
I donât have a clear enough sense of what youâre even talking about, but there are definitely at least some additional interventions you could run in addition to the thinking tools⌠eg. monitoring, faithful CoT techniques for marginally truer reasoning traces, you could run probes, Anthropic runs a classifier to help with robust jailbreaking for misuse etc. âŚ
I think that something like âdefense in depthâ is something like the current slogan of AI Safety. So, sure I can imagine all sorts of stuff you could try to run for more transparency beyond deterministic tool use, but w/âo a cleaer conception of the finer points it feels like I should say that there are quite an awful lot of inherent limitations, but plenty of options /â things to try as well.
Like, ârobustly managing interpretabilityâ is more like a holy grail than a design spec in some ways lol.
What are the biggest practical hurdles in considering the implementation of CCACS, and what potential avenues might exist to overcome them?
I think that a lot of what it is shooting for is aspirational and ambitious and correctly points out limitations in the current approaches and designs of AI. All of that is spot on and there is a lot to like here.
However, I think the problem of interpeting and building appropriate trust in complex learned algorithmic systems like LLMs is a tall order. âTransparency by designâ is truly one of the great technological mandates of our era, but without more context it can feel like a buzzword like âsecurity by designâ.
I think the biggest âbarrierâ I can see is just that this framing just isnât sticky enough to survive memetically and people keep trying to do transparency, tool use, control, reasoning, etc. under different frames.
But still, I think there is a lot of value in this space and you would get paid big bucks if you could even marginally improve current ablity to get trustworthy interpretable work out of LLMs. So, yâknow, keep up the good work!
Thanks, itâs not that original. I am sure I have heard them talk about AIs negotiating and forgetting stuff on the 80,000 Hours Podcast and David Brin has a book that touches on this a lot called âThe Transparent Societyâ. I havenât actually read it, but I heard a talk he gave.
Maybe technological surveillance and enforcement requirements will actually be really intense at technological maturity and you will need to be really powerful and really local and need to have a lot of context for whatâs going on. In that case, some value like privacy or âbeing aloneâ might be really hard to save.
Hopefully, even in that case, you could have other forms of restraint. Like, I can still imagine that if something like the orthogonality thesis is true, then you could maybe have a really really elegant, light-touch special focus anti super-weapons system that feels fundamentally limited to that goal in a reliable sense. If we understood the cognitive elements enough that it felt like physics or programming, then we could even say that the system meaningfully COULD NOT do certain things (violate the prime directive or whatever) and then it wouldnât feel as much like an omnipotent overlord as a special purpose tool deployed by local LE (because this place would be bombed or invaded if it could not prove it had established such a system).
If you are a poor peasant farmer world, then maybe nobody needs to know what your people are writing in their diaries. But if you are the head of fast prototyping and automated research at some relevant dual use technology firm, then maybe there should be much more oversight. Idk, there feels like lots of room for gradation, nuance, and context awareness here, so I guess I agree with you that the âproblem of libertyâ is interesting.
There was a lot to this that was worth responding to. Great work.
I think making God would actually be a bad way to handle this. I think you could probably stop this with superior forms of limited knowledge surveillance. I think there are likely socio-technical remedies to dampen some of the harsher liberty-related tradeoffs here considerably.
Imagine, for example a more distributed machine intelligence system. Perhaps itâs really not all that invasive to monitor that youâre not making a false vacuum or whatever. And it uses futuristic auto-secure hyper-delete technology to instantly delete everything it sees that isnât relevant.
Also the system itself isnât all that powerful, but rather can alert others /â draw attention to important things. And system implementation as well as the actual violent /â forceful enforcement that goes along with the system probably can and should also be implemented in a generally more cool, chill, and fair way than I associate with the Christian God centralized surveillance and control systems.
Also, a lot of these problems are already extremely salient for âhow to stop civilization ending superweapons from being createdâ-style problems we are already in the midst of here in 2025 Earth. It seems basically true that you do ~need to maintain some level of coordination with /â dominance over anything that could/âmight make a super weapon that could kill you if you want to stay alive indefinitely.
Ya, idk, I am just saying that the tradeoff framing feels unnatural. Or, like, maybe thatâs one lens, but I donât actually generally think in terms of tradeoffs b/âw my moral efforts.
Like, I get tired of various things ofc, but itâs not usually just cleanly fungible b/âw different ethical actions I might plausibly take like that. To the extent it really does work this way for you or people you know on this particular tradeoff, then yep; I would say power to ya for the scope sensitivity.
I agree that the quantitative aspect of donation pushes towards even marginal internal tradeoffs here mattering and I donât think I was really thinking about it as necessarily binary.
I agree with 1, but I think the framing feels forced for point #2.
I donât think itâs obvious that these actions would be strongly in tension with each other. Donating to effective animal charities would correlate quite strongly with being vegan.
Homo economicus deciding what to eat for dinner or something lol.
I actually totally agree that donations are an important part of personal ethics! Also, I am all aboard for the social ripple effects theory of change for effective donation. Hell yes to both of those points. I might have missed it, but I donât know that OP really argues against those contentions? I guess they donât frame it like that though.
I appreciate this survey and I found many of your questions to be charming probes. I would like to register that I object to the âis elitism good actually?â framing here. There is a very common way to define the term âelitismâ that is just straightforwardly negative. Like, âelitismâ implies classist, inegalitarian stuff that goes beyond just using it as an edgelord libertarian way of saying âmeritocracyâ.
I think there is a lot of conceptual tension between EA as a literal mass movement and EA as an usually talent dense clique /â professional network. Probably there is room in the world for both high skill professional networks and broad ethical movements, but yâknow âŚ
I think a real life scenarios where AI kills the most people today is governance stuff and military stuff.
I feel like I have heard the most unhinged haunted uses of LLMs in government and policy spaces. I think that certain people have just âlearned to stop worrying and love the hallucinationâ. They are living like it is the future already and getting people killed with their ignorance and spreading /âusing AI bs in bad faith.
Plus, there is already a lot of slaughter bot stuff going on eg. âRobots Firstâ war in Ukraine.
Maybe job automation is worth saying too. I believe Andrew Yangâs stance for example is that it is already largely here and most people just do have less labor power already, but I could be mischaracterizing this. I think âjobs stuffâ plausibly shades right into doom via âindustrial dehumanizationâ /â gradual disempowerment. In the mean time it hurts people too.
Thanks for everything Holly! Really cool to have people like you actively calling for international pause on ASI!
Hot take: Even if most people hear a really loud ass warning shot, it is just going to fuck with them a lot, but not drive change. What are you even expecting typical poor and middle class nobodies to do?
March in the street and become activists themselves? Donate somewhere? Post on social media? Call representatives? Buy ads (likely from Google or Meta)? Divest in risky AI projects? Boycott LLMs/âcompanies?
Ya, okay, I feel like the pathway from âworryâ to any of that if generally very windy, but sure. I still feel like that is just a long way from the kind of galvanized political will and real change you would need for eg. major AI companies with huge market cap to get nationalized or wiped off the market or whatever.
I donât even know how to picture a transition to an intelligence explosion resistant world and I am pretty knee deep in this stuff. I think the road from here to good outcome is just too blurry for much a lot of the time. It is easy to feel and be disempowered here.
Distillation for Robust Unlearning Paper (https://ââarxiv.org/ââabs/ââ2506.06278) makes me re-interested in the idea of using distillation to absorb the benefits of a Control Protocol (https://ââarxiv.org/ââabs/ââ2312.06942).
I thought that was a natural âDistillation and Amplificationâ next step based for control anyways, but the empirical results for unlearning make me excited about how this might work for control again.
Like, I guess I am just saying that if you are actually in a regime where you are using Trusted model some nontrivial fraction of the time, you might be able to distill off of that.
I relate it to the idea of iterated amplification and distillation; the control protocol is the scaffold/âamplification. Plus, it seems natural that your most troubling outputs would receive special attention from bot/âhuman/âcyborg overseers and receive high quality training feedback.
Training off of control might make no sense at all if you then think of that model as just one brain playing a game with itself that it can always rig/âfake easily. And since a lot of the concern is scheming, this might basically make the âcontrol protocol distillâ dead on arrival because any worthwhile distill would still need to be smart enough that it might be sneak attacking us for roughly the same reasons the original model was and even extremely harmless training data doesnât help us with that.
Seems good to make the model tend to be more cool and less sketchy even if it would only be ~âtrusted model level goodâ at some stuff. Idk though, I am divided here.
Hereâs a question that comes to mind: if local EA communities make people 3x more motivated to pursue high-impact careers, or make it much easier for newcomers to engage with EA ideas, then even if these local groups are only operating at 75% efficiency compared to some theoretical global optimum, you still get significant net benefit.
I am sympathetic to this argument vibes wise and I thought this was an elegant numerate utilitarian case for it. Part of my motivation is that I think it would be good if a lot of EA-ish values were a lot more mainstream. Like, I would even say that you probably get non-linear returns to scale in some important ways. You kind of need a critical mass of people to do certain things.
It feels like, necessarily, these organizations would also be about providing value to the members as well. That is a good thing.
I think there is something like a âbut what if we get watered down too muchâ concern latent here. I can kind of see how this would happen, but I am also not that worried about it. The tent is already pretty big in some ways. Stuff like numerate utilitarianism, empiricism, broad moral circles, thoughtfulness, tough trade-offs doesnât seem in danger of going away soon. Probably EA growing would spread these ideas rather than shrink them.
Also, I just think that societies/âpeople all over the world could significantly benefit from stronger third pillars and that the ideal versions of these sorts of community spaces would tend to share a lot of things in common with EA.
Picture it. The year is 2035 (9 years after the RSI near-miss event triggered the first Great Revolt). You ride your bitchinâ electric scooter to the EA-adjacent community center where you and your friends co-work on a local voter awareness campaign, startup idea, or just a fun painting or whatever. An intentional community.
That sounds like a step towards the glorious transhumanist future to me, but maybe the margins on that are bad in practice and the community centers of my day dreams will remain merely EA-adjacent. Perhaps, I just need to move to a town with cooler libraries. I am really not sure what the Dao here is or where the official EA brand really fits into any of this.
Ya, maybe. This concern/âway of thinking just seems kind of niche. Probably only a very small demographic who overlaps with me here. So I guess I wouldnât expect it to be a consequential amount of money to eg. Anthropic or OpenAI.
That check box would be really cool though. It might ease friction /â dissonance for people who buy into high p(doom) or relatively non-accelerationist perspectives. My views are not representative of anyone, but me, but a checkbox like that would be a killer feature for me and certainly win my $20/âmo :) . And maybe, yâknow, all 100 people or whatever who would care and see it that way.
What could be more topical on the EAF than the theory of change and ethics of Anthropic PBC?
You were very thorough and I think the listicle format worked well. I largely agree with what you laid out here and I appreciate you doing the footwork of making so much of this more explicit /â legible.
A lot of this stuff feels shady and even cuts against certain justifications I have heard for Anthropic recently from eg. Holden Karnofsky on 80k and Joe Carlsmith in his blog post. It definitely seems worth being clear headed about.