Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
What dynamics do you have in mind specifically?
Always a strong unilateralist curse with infohazard stuff haha.
I think it is reasonably based and there is a lot to be said for hype, infohazards, and the strange futurist x-risk warning to product company pipeline. It may even be especially potent or likely to bite in exactly the EA milieu.
I find the idea of Waluigi a bit of a stretch given that âwhat if the robot became evilâ is a trope. And so is the Christian devil for example. âEvilâ seems at least adjacent to âstrong value pessimizationâ.
Maybe a literal bit flip utility minimizer is rare (outside of eg extortion) and talking about it would spread the memes and some cultist or confused billionaire would try to build it sort of thing?
Thanks for sharing, good to read. I got most excited about 3, 6, 7, and 8.
As far as 6 goes, I would add that I think it would probably be good if AI Safety had a more mature academic publishing scene in general and some more legit journals. There is a place for the Alignment Forum, arXiv, conference papers, and such but where is Nature AI Safety or equivalent.
I think there is a lot to be said for basically raising the waterline there. I know there is plenty of AI Safety stuff that has been published for decades in perfectly respectable academic journals and such. I personally like the part in âComputing Machinery and Intelligenceâ where Turing says that we may need to rise up against the machines to prevent them from taking control.
Still, it is a space I want to see grow and flourish big time. In general, big ups to more and better journals, forums, and conferences within such fields as AI Safety /â Robustly Beneficial AI Research, Emerging Technologies Studies, Pandemic Prevention, and Existential Security.
EA forum, LW, and the Alignment Forum have their place, but these ideas ofc need to germinate out past this particular clique/âbubble/âsubculture. I think more and better venues for publishing are probably very net good in that sense as well.
7 is hard to think about but sounds potentially very high impact. If any billionaires ever have a scary ChatGPT interaction or a similar come to Jesus moment and google âhow to spend 10 billion dollars to make AI safeâ (or even ask Deep Research), then you could bias/â frame the whole discussion /â investigation heavily from the outset. I am sure there is plenty of equivalent googling by staffers and congresspeople in the process of making legislation now.
8 is right there with AI tools for existential security. I mostly agree that an AI product which didnât push forward AGI, but did increase fact checking would be good. This stuff is so hard to think about. There is so much moral hazard in the water and I feel like I am âvibe capturedâ by all the Silicon Valley money in the AI x-risk subculture.
Like, for example, I am pretty sure I donât think it is ethical to be an AGI scaling/âracing company even if Anthropic has better PR and vibes than Meta. Is it okay to be a fast follower though? Compete in terms of fact checking, sure but is making agents more reliable or teaching Claude to run a vending machine âsafetyâ or is that merely equivocation.
Should I found a synthetic virology unicorn, but we will be way chiller than other synthetic virology companies. And itâs not completely dis-analogous because there are medical uses for synthetic virology and pharma companies are also huge capital intensive high tech operations who spend 100s of millions on a single product. Still, that sounds awful.
Maybe you think armed balance of power with nuclear weapons is a legitimate use case. It would still be bad to do a nuclear bomb research company that lets you scale and reduces costs etc. for nuclear weapons. But idk. What if you really could put in a better control system than the other guy? Should hippies start military tech startups now?
Should I start a competing plantation that, in order to stay profitable and competitive with other slave plantations uses slave labor and does a lot of bad stuff. And if I assume that the demands of the market are fixed and this is pretty much the only profitable way to farm at scale, then so as long as I grow my wares at a lower cruelty per bushel than the average of my competitors am I racing to the top? It gets bad. Same thing could apply to factory farming.
(edit: I reread this comment and wanted to go more out of my way to say that I donât think this represents a real argument made presently or historically for chattel slavery. It was merely an offhand insensitive example of a horrific tension b/âw deontology and simple goodness on the one hand and a slice of galaxy brained utilitarian reasoning on the other.)
Like I said, so much moral hazard in the idea of âAGI company for good stuffâ, but I think I am very much in favor of âAI for AI Safetyâ and âAI tools for existential security. I like âfact checkingâ as a paradigm example of a prosocial use case.
Hey, cool stuff! I have ideated and read a lot on similar topics and proposals. Love to see it!
Is the âThinking Toolsâ concept worth exploring further as a direction for building a more trustworthy AI core?
I am agnostic about whether you will hit technical paydirt. I donâłt really understand what you are proposing on a âgears levelâ I guess and Iâm not sure I could make a good guess even if I did. But, I will say that I think the vibe of your approach sounded pleasant and empowering. It was a little abstract to me I guess Iâm saying, but that need not be a bad thing maybe youâre just visionary.
It reminds me of the idea of using RAG or Toolformer to get LLMs to âshow their workâ and âcite their sourcesâ and stuff. There is surely a lot of room for improvement there bc Claude bullshits me with links on the regular.
This also reminds me of Conjectureâs Cognitive Emulation work and even just Max Tegmark and Steve Omohundroâs emphasis on making inscrutable LLMs to use deterministic proof checkers heavily to win back certain gaurantees.
Is the âLED Layerâ a potentially feasible and effective approach to maintain transparency within a hybrid AI system, or are there inherent limitations?
I donât have a clear enough sense of what youâre even talking about, but there are definitely at least some additional interventions you could run in addition to the thinking tools⌠eg. monitoring, faithful CoT techniques for marginally truer reasoning traces, you could run probes, Anthropic runs a classifier to help with robust jailbreaking for misuse etc. âŚ
I think that something like âdefense in depthâ is something like the current slogan of AI Safety. So, sure I can imagine all sorts of stuff you could try to run for more transparency beyond deterministic tool use, but w/âo a cleaer conception of the finer points it feels like I should say that there are quite an awful lot of inherent limitations, but plenty of options /â things to try as well.
Like, ârobustly managing interpretabilityâ is more like a holy grail than a design spec in some ways lol.
What are the biggest practical hurdles in considering the implementation of CCACS, and what potential avenues might exist to overcome them?
I think that a lot of what it is shooting for is aspirational and ambitious and correctly points out limitations in the current approaches and designs of AI. All of that is spot on and there is a lot to like here.
However, I think the problem of interpeting and building appropriate trust in complex learned algorithmic systems like LLMs is a tall order. âTransparency by designâ is truly one of the great technological mandates of our era, but without more context it can feel like a buzzword like âsecurity by designâ.
I think the biggest âbarrierâ I can see is just that this framing just isnât sticky enough to survive memetically and people keep trying to do transparency, tool use, control, reasoning, etc. under different frames.
But still, I think there is a lot of value in this space and you would get paid big bucks if you could even marginally improve current ablity to get trustworthy interpretable work out of LLMs. So, yâknow, keep up the good work!
Thanks, itâs not that original. I am sure I have heard them talk about AIs negotiating and forgetting stuff on the 80,000 Hours Podcast and David Brin has a book that touches on this a lot called âThe Transparent Societyâ. I havenât actually read it, but I heard a talk he gave.
Maybe technological surveillance and enforcement requirements will actually be really intense at technological maturity and you will need to be really powerful and really local and need to have a lot of context for whatâs going on. In that case, some value like privacy or âbeing aloneâ might be really hard to save.
Hopefully, even in that case, you could have other forms of restraint. Like, I can still imagine that if something like the orthogonality thesis is true, then you could maybe have a really really elegant, light-touch special focus anti super-weapons system that feels fundamentally limited to that goal in a reliable sense. If we understood the cognitive elements enough that it felt like physics or programming, then we could even say that the system meaningfully COULD NOT do certain things (violate the prime directive or whatever) and then it wouldnât feel as much like an omnipotent overlord as a special purpose tool deployed by local LE (because this place would be bombed or invaded if it could not prove it had established such a system).
If you are a poor peasant farmer world, then maybe nobody needs to know what your people are writing in their diaries. But if you are the head of fast prototyping and automated research at some relevant dual use technology firm, then maybe there should be much more oversight. Idk, there feels like lots of room for gradation, nuance, and context awareness here, so I guess I agree with you that the âproblem of libertyâ is interesting.
There was a lot to this that was worth responding to. Great work.
I think making God would actually be a bad way to handle this. I think you could probably stop this with superior forms of limited knowledge surveillance. I think there are likely socio-technical remedies to dampen some of the harsher liberty-related tradeoffs here considerably.
Imagine, for example a more distributed machine intelligence system. Perhaps itâs really not all that invasive to monitor that youâre not making a false vacuum or whatever. And it uses futuristic auto-secure hyper-delete technology to instantly delete everything it sees that isnât relevant.
Also the system itself isnât all that powerful, but rather can alert others /â draw attention to important things. And system implementation as well as the actual violent /â forceful enforcement that goes along with the system probably can and should also be implemented in a generally more cool, chill, and fair way than I associate with the Christian God centralized surveillance and control systems.
Also, a lot of these problems are already extremely salient for âhow to stop civilization ending superweapons from being createdâ-style problems we are already in the midst of here in 2025 Earth. It seems basically true that you do ~need to maintain some level of coordination with /â dominance over anything that could/âmight make a super weapon that could kill you if you want to stay alive indefinitely.
Ya, idk, I am just saying that the tradeoff framing feels unnatural. Or, like, maybe thatâs one lens, but I donât actually generally think in terms of tradeoffs b/âw my moral efforts.
Like, I get tired of various things ofc, but itâs not usually just cleanly fungible b/âw different ethical actions I might plausibly take like that. To the extent it really does work this way for you or people you know on this particular tradeoff, then yep; I would say power to ya for the scope sensitivity.
I agree that the quantitative aspect of donation pushes towards even marginal internal tradeoffs here mattering and I donât think I was really thinking about it as necessarily binary.
I agree with 1, but I think the framing feels forced for point #2.
I donât think itâs obvious that these actions would be strongly in tension with each other. Donating to effective animal charities would correlate quite strongly with being vegan.
Homo economicus deciding what to eat for dinner or something lol.
I actually totally agree that donations are an important part of personal ethics! Also, I am all aboard for the social ripple effects theory of change for effective donation. Hell yes to both of those points. I might have missed it, but I donât know that OP really argues against those contentions? I guess they donât frame it like that though.
I appreciate this survey and I found many of your questions to be charming probes. I would like to register that I object to the âis elitism good actually?â framing here. There is a very common way to define the term âelitismâ that is just straightforwardly negative. Like, âelitismâ implies classist, inegalitarian stuff that goes beyond just using it as an edgelord libertarian way of saying âmeritocracyâ.
I think there is a lot of conceptual tension between EA as a literal mass movement and EA as an usually talent dense clique /â professional network. Probably there is room in the world for both high skill professional networks and broad ethical movements, but yâknow âŚ
I think a real life scenarios where AI kills the most people today is governance stuff and military stuff.
I feel like I have heard the most unhinged haunted uses of LLMs in government and policy spaces. I think that certain people have just âlearned to stop worrying and love the hallucinationâ. They are living like it is the future already and getting people killed with their ignorance and spreading /âusing AI bs in bad faith.
Plus, there is already a lot of slaughter bot stuff going on eg. âRobots Firstâ war in Ukraine.
Maybe job automation is worth saying too. I believe Andrew Yangâs stance for example is that it is already largely here and most people just do have less labor power already, but I could be mischaracterizing this. I think âjobs stuffâ plausibly shades right into doom via âindustrial dehumanizationâ /â gradual disempowerment. In the mean time it hurts people too.
Thanks for everything Holly! Really cool to have people like you actively calling for international pause on ASI!
Hot take: Even if most people hear a really loud ass warning shot, it is just going to fuck with them a lot, but not drive change. What are you even expecting typical poor and middle class nobodies to do?
March in the street and become activists themselves? Donate somewhere? Post on social media? Call representatives? Buy ads (likely from Google or Meta)? Divest in risky AI projects? Boycott LLMs/âcompanies?
Ya, okay, I feel like the pathway from âworryâ to any of that if generally very windy, but sure. I still feel like that is just a long way from the kind of galvanized political will and real change you would need for eg. major AI companies with huge market cap to get nationalized or wiped off the market or whatever.
I donât even know how to picture a transition to an intelligence explosion resistant world and I am pretty knee deep in this stuff. I think the road from here to good outcome is just too blurry for much a lot of the time. It is easy to feel and be disempowered here.
Distillation for Robust Unlearning Paper (https://ââarxiv.org/ââabs/ââ2506.06278) makes me re-interested in the idea of using distillation to absorb the benefits of a Control Protocol (https://ââarxiv.org/ââabs/ââ2312.06942).
I thought that was a natural âDistillation and Amplificationâ next step based for control anyways, but the empirical results for unlearning make me excited about how this might work for control again.
Like, I guess I am just saying that if you are actually in a regime where you are using Trusted model some nontrivial fraction of the time, you might be able to distill off of that.
I relate it to the idea of iterated amplification and distillation; the control protocol is the scaffold/âamplification. Plus, it seems natural that your most troubling outputs would receive special attention from bot/âhuman/âcyborg overseers and receive high quality training feedback.
Training off of control might make no sense at all if you then think of that model as just one brain playing a game with itself that it can always rig/âfake easily. And since a lot of the concern is scheming, this might basically make the âcontrol protocol distillâ dead on arrival because any worthwhile distill would still need to be smart enough that it might be sneak attacking us for roughly the same reasons the original model was and even extremely harmless training data doesnât help us with that.
Seems good to make the model tend to be more cool and less sketchy even if it would only be ~âtrusted model level goodâ at some stuff. Idk though, I am divided here.
Hereâs a question that comes to mind: if local EA communities make people 3x more motivated to pursue high-impact careers, or make it much easier for newcomers to engage with EA ideas, then even if these local groups are only operating at 75% efficiency compared to some theoretical global optimum, you still get significant net benefit.
I am sympathetic to this argument vibes wise and I thought this was an elegant numerate utilitarian case for it. Part of my motivation is that I think it would be good if a lot of EA-ish values were a lot more mainstream. Like, I would even say that you probably get non-linear returns to scale in some important ways. You kind of need a critical mass of people to do certain things.
It feels like, necessarily, these organizations would also be about providing value to the members as well. That is a good thing.
I think there is something like a âbut what if we get watered down too muchâ concern latent here. I can kind of see how this would happen, but I am also not that worried about it. The tent is already pretty big in some ways. Stuff like numerate utilitarianism, empiricism, broad moral circles, thoughtfulness, tough trade-offs doesnât seem in danger of going away soon. Probably EA growing would spread these ideas rather than shrink them.
Also, I just think that societies/âpeople all over the world could significantly benefit from stronger third pillars and that the ideal versions of these sorts of community spaces would tend to share a lot of things in common with EA.
Picture it. The year is 2035 (9 years after the RSI near-miss event triggered the first Great Revolt). You ride your bitchinâ electric scooter to the EA-adjacent community center where you and your friends co-work on a local voter awareness campaign, startup idea, or just a fun painting or whatever. An intentional community.
That sounds like a step towards the glorious transhumanist future to me, but maybe the margins on that are bad in practice and the community centers of my day dreams will remain merely EA-adjacent. Perhaps, I just need to move to a town with cooler libraries. I am really not sure what the Dao here is or where the official EA brand really fits into any of this.
Ya, maybe. This concern/âway of thinking just seems kind of niche. Probably only a very small demographic who overlaps with me here. So I guess I wouldnât expect it to be a consequential amount of money to eg. Anthropic or OpenAI.
That check box would be really cool though. It might ease friction /â dissonance for people who buy into high p(doom) or relatively non-accelerationist perspectives. My views are not representative of anyone, but me, but a checkbox like that would be a killer feature for me and certainly win my $20/âmo :) . And maybe, yâknow, all 100 people or whatever who would care and see it that way.
Ya, really sad to hear that!
I mean, if they were going to fire him for that, maybe just donât hire him. Feels kind of mercurial that they were shamed into caning him so easily/âsoon.
Ofc, I can understand nonviolence being, like, an important institutional messaging northstar lol. But the vibes are kind of off when you are going to fire a recently hired podcaster bc of a cringe compilation of their show. Seriously, what the hell?
I do not think listening to that podcast made me more violent. In fact, the thought experiments and ideation like what he was touching on is, like, perfectly reasonable given the stakes he is laying out and the urgency he is advocating. Like, itâs not crazy ground to cover; he talks for hours. Whatever lol, at least I think it was understandable in context.
Part of it feels like the equivalent of having to lobby against âcivilian nuclear warheadsâ. And you say âI wish that only a small nuclear detonation would occur when an accidental detonation does happen, but the fact is thereâs likely to be a massive nuclear chain reaction.â and then getting absolutely BLASTED after someone clips you saying that you actually WISH a âsmall nuclear detonationâ would occur. What a sadist you must be!
This feels like such stupid politics. I really want John Sherman to land on his feet. I was quite excited to see him take the role. Maybe that org was just not the right fit though...
I think it might be cool if an AI Safety research organization ran a copy of an open model or something and I could pay them a subscription to use it. That way I know my LLM subscription money is going to good AI stuff and not towards the stuff that AI companies that I donât think I like or want more of on net.
Idk, existing independent orgs might not be the best place to do this bc it might âdamn themâ or âcorrupt themâ over time. Like, this could lead them to âselling outâ in a variety of ways you might conceive of that.
Still, I guess I am saying that to the extent anyone is going to actually âmake moneyâ off of my LLM usage subscriptions, it would be awesome if it were just a cool independent AIS lab I personally liked or similar. (I donât really know the margins and unit economics which seems like an important part of this pitch lol).
Like, if âGoodGuy AIS Labâ sets up a little website and inference server (running Qwen or Llama or whatever) then I could pay them the $15-25 a month I may have otherwise paid to an AI company. The selling point would be that less âmoral hazardâ is better vibes, but probably only some people would care about this at all and it would be a small thing. But also, itâs hardly like a felt sense of moral hazard around AI is a terribly niche issue.
This isnât the âfinal formâ of this I have in mind necessarily; I enjoy picking at ideas in the space of âwhat would a good guy AGI project doâ or âhow can you do neglected AIS /â âAI go wellâ research in a for-profit wayâ.
I also like the idea of an explicitly fast follower project for AI capabilities. Like, accelerate safety/âsecurity relevant stuff and stay comfortably middle of the pack on everything else. I think improving GUIs is probably fair game too, but not once it starts to shade into scaffolding I think? I wouldnât know all of the right lines to draw here, but I really like this vibe.
This might not work well if you expect gaps to widen as RSI becomes a more important input. I would argue that seems too galaxy brained given that, as of writing, we do live in a world with a lot of mediocre AI companies that I believe can all provide products of ~comparable quality.
It is also just kind of a bet that in practice it is probably going to remain a lot less expensive to stay a little behind the frontier than to be at the frontier. And that, in practice, it may continue to not matter in a lot of cases.
Thanks for linking âLine Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Modelsâ. Also, I agree with:
MacAskill and Moorhouse argue that increases in training compute, inference compute and algorithmic efficiency have been increasing at a rate of 25 times per year, compared to the number of human researchers which increases 0.04 times per year, hence the 500x faster rate of growth. This is an inapt comparison, because in the calculation the capabilities of âAI researchersâ are based on their access to compute and other performance improvements, while no such adjustment is made for human researchers, who also have access to more compute and other productivity enhancements each year.
That comparison seems simplistic and inapt for at least a few reasons. That does seem like pretty âtrust me broâ justification for the intelligence explosion lol. Granted, I only listened to the accompanying podcast, so I canât speak too much to the paper.
Still, I am of two minds. I still buy into a lot of the premise of âPreparing for the Intelligence Explosionâ. I find the idea of getting collectively blind-sighted by rapid, uneven AI progress ~eminently plausible. There didnât even need to be that much of a fig leaf.
Donât get me wrong, I am not personally very confident in âexpert level AI researcher for arbitrary domainsâ w/âi the next few decades. Even so, it does seem like the sort of thing worth thinking about and preparing about.
From one perspective, AI coding tools are just recursive self improvement gradually coming online. I think I understand some of the urgency, but I appreciate the skepticism a lot too.
Preparing for an intelligence explosion is a worthwhile thought experiment at least. It seems probably good to know what we would do in a world with âa lot of powerful AIâ given that we are in a world where all sorts of people are trying to research/âmake/âsell ~âa lot of powerful AIâ. Like just in case, at least.
I think I see multiple sides. Lots to think about.
I think the focus is generally placed on the cognitive capacities of AIs because it is expected that it will just be a bigger deal overall.
There is at least one 80,000 hours podcast episode on robotics. It tries to explain why itâs hard to do ML on, but I didnât understand it.
Also, I think Max Tegmark wrote some stuff on slaughterbots in Life 3.0. Yikes!
You could try looking for other differential development stuff too if you want. I recently liked: AI Tools for Existential Security. I think itâs a good conceptual framework for emerging tech /â applied ethics stuff I think. Of course, still leaves you with a lot of questions :)
I love to see stuff like this!
It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmithâs recent âAI for AI Safetyâ, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the âTools for Existential Securityâ framing. Your example applications were on point and many of them brought up things I hadnât even considered. I enjoy the idea of rapidly solving lots of coordination failures.
This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the âEpistemicsâ category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.
What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read âWinners of the Essay competition on the Automation of Wisdom and Philosophyâ). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldnât want to be deploying whack AI in your âsave the worldâ cutting edge AI startup.
And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.
***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think thatâs just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing /â clustering. Although you also say âwe guess that the highest priority applications will fall into the categories listed aboveâ which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research /â cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so Iâm sorry if I overlap a lot. I will try to only mention things you didnât explicitly suggest:
Epistemics:
you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
I should say up front that I donât know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly ârace-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METRâs task horizon metric could be accused here.
From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
broad economics and societal impact research
the effects of having AI more or less âdo the economyâ seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian /â long termist to care for normal reasons)
I think it would be good to accelerate âRisk evaluationâ in a sense that I think was defined really elegantly by Joe Carlsmith in âPaths and waystations in AI safetyâ [1]
building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little âouroborosâ to suggest but I think it works
Coordination-enabling:
movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
I know it is not the main thrust of âexistential securityâ, but I think it worth considering the potential for âabundant cognitionâ to welfare /â sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
AI Safety might be worse if the AIs are âdiscontentâ
we could lock in a future where most people are suffering terribly which would not count as existential security
it seems worthwhile to know if the AI workers are suffering ASAP for normal âavoid doing moral catastrophesâ reasons
we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. âhedoniumâ or the Far Out Initiative)
That said, I have not really considered the offense /â defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
Certain forms of monitoring and openly researching other peopleâs actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM âslopâ or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isnât always the entity to trust with your most critical security.
Lots to think about here! Thanks!
Joe Carlsmith: âRisk evaluation tracks the safety range and the capability frontier, and it forecasts where a given form of AI development/âdeployment will put them.
Paradigm examples include:
evals for dangerous capabilities and motivations;
forecasts about where a given sort of development/âdeployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/âor AI forecasting to relevant questions, etc);
general improvements to our scientific understanding of AI
structured safety cases and/âor cost-benefit analyses that draw on this information.â
I really enjoyed this post! In general, I am a big fan of efforts to improve our collective decision making in a way that builds off of existing democracy/âvoting schemes. Iâm a big fan of approval voting for example.
This was the first Iâve heard of election by jury. I checked out your website a bit too. Cool stuff, I really like the idea of using sampling to help mitigate other issues including the overhead costs of having a well informed citizenry on every issue.
Thanks!
Ya, I think thatâs right. I think making bad stuff more salient can make it more likely in certain contexts.
For example, I can imagine it to be naive to be constantly transmitting all sorts of detailed information, media, and discussion about specific weapons platforms. Raising awareness that you really hope the bad guys donât develop because it might make them too strong. I just read âPower to the People: How Open Technological Innovation Is Arming Tomorrowâs Terroristsâ by Audrey Kurth Cronin and I think it has a really relevant vibe here. Sometimes I worry about EAs doing unintentional advertisement for eg. bioweapons and superintelligence.
On the other hand, I think that topics like s-risk are already salient enough for other reasons. Like, I think extreme cruelty and torture have arisen independently at a lot of times throughout history and nature. And there are already ages worth of pretty unhinged torture porn stuff that people write which exist already on a lot of other parts of the internet. For example, the Christian conception of hell or horror fiction.
This seems sufficient to say we are unlikely to significantly increase the likelihood of âblind grabs from the memeplexâ leading to mass suffering. Even cruel torture is already pretty salient. And suffering is in some sense simple if it is just âthe opposite of pleasureâ or whatever. Utilitarians commonly talk in these terms already.
I will agree that I donât think itâs good to carelessly spread memes about specific bad stuff sometimes. I donât always know how to navigate the trade offs here; probably there is at least some stuff broadly related to GCRs and s-risks which is better left unsaid. But also a lot of stuff related to s-risk is there whether you acknowledge it or not. I submit to you that surely some level of âraise awareness so that more people and resources can be used on mitigationâ is necessary/âgood?