Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
Pause AI /â Veganish
Lets do a bunch of good stuff and have fun gang!
I think the focus is generally placed on the cognitive capacities of AIs because it is expected that it will just be a bigger deal overall.
There is at least one 80,000 hours podcast episode on robotics. It tries to explain why itâs hard to do ML on, but I didnât understand it.
Also, I think Max Tegmark wrote some stuff on slaughterbots in Life 3.0. Yikes!
You could try looking for other differential development stuff too if you want. I recently liked: AI Tools for Existential Security. I think itâs a good conceptual framework for emerging tech /â applied ethics stuff I think. Of course, still leaves you with a lot of questions :)
I love to see stuff like this!
It has been a pleasure reading this, listening to your podcast episode, and trying to really think it through,
This reminds me of a few other things I have seen lately like Superalignment, Joe Carlsmithâs recent âAI for AI Safetyâ, and the recent 80,000 Hours Podcast with Will McAskill.
I really appreciate the âTools for Existential Securityâ framing. Your example applications were on point and many of them brought up things I hadnât even considered. I enjoy the idea of rapidly solving lots of coordination failures.
This sort of DAID approach feels like an interesting continuation on other ideas about differential acceleration and the vulnerable world hypothesis. Trying to get this right can feel like some combination of applied ethics and technology forecasting.
Probably one of the weirdest or most exciting applications you suggest is AI for philosophy. You put it under the âEpistemicsâ category. I usually think of epistemics as a sub-branch of philosophy, but I think I get what you mean. AI for this sort of thing remains exciting, but very abstract to me.
What a heady thing to think about; really exciting stuff! There is something very cosmic about the idea of using AI research and cognition for ethics, philosophy, and automated wisdom. (I have been meaning to read âWinners of the Essay competition on the Automation of Wisdom and Philosophyâ). I strongly agree that since AI comes with many new philosophically difficult and ethically complex questions, it would be amazing if we could use AI to face these.
The section on how to accelerate helpful AI tools was nice too.
Appendix 4 was gold. The DPD framing is really complimentary to the rest of the essay. I can totally appreciate the distinction you are making, but I also see DPD as bleeding into AI for Existential Safety a lot as well. Such mixed feelings. Like, for one thing, you certainly wouldnât want to be deploying whack AI in your âsave the worldâ cutting edge AI startup.
And it seems like there is a good case for thinking about doing better pre-training and finding better paradigms if you are going to be thinking about safer AI development and deployment a lot anyways. Maybe I am missing something about the sheer economics of not wanting to actually do pre-training ever.
In any case, I thought your suggestions around aiming for interpretable, robust, safe paradigms were solid. Paradigm-shaping and application-shaping are both interesting.
***
I really appreciate that this proposal is talking about building stuff! And that it can be done ~unilaterally. I think thatâs just an important vibe and an important type of project to have going.
I also appreciate that you said in the podcast that this was only one possible framing /â clustering. Although you also say âwe guess that the highest priority applications will fall into the categories listed aboveâ which seems like a potentially strong claim.
I have also spent some time thinking about which forms of ~research /â cognitive labor would be broadly good to accelerate for similar existential security reasons and I kind of tried to retrospectively categorize some notes I had made with your framing. I had some ideas that were hard to categorize cleanly into epistemics, coordination, or direct risk targeting.
I included a few more ideas for areas where AI tools, marginal automated research, and cognitive abundance might be well applied. I was going for a similar vibe, so Iâm sorry if I overlap a lot. I will try to only mention things you didnât explicitly suggest:
Epistemics:
you mention bench-marking as a strategy for accelerating specific AI applications, but it also deserves mention as an epistemic tool. METR style elicitation tests too
I should say up front that I donât know if, due to acceleration and iteration effects you mention, eg. FrontierMath and lastexam.ai are explicitly ârace-dynamic-accelerating in a way that overshadows their epistemic usefulness; even METRâs task horizon metric could be accused here.
From a certain perspective, I would consider benchmarks and METR elicitation tests natural compliments to mech interp and AI capabilities forecasting
this would include capabilities and threats assessment (hopefully we can actively iterate downwards on risk assessment scores)
broad economics and societal impact research
the effects of having AI more or less âdo the economyâ seem vast and differentially accelerating understanding and strategy there seems like a non-trivial application relevant to the long term future of humanity
wealth inequality and the looming threat of mass unemployment (at minimum, this seems important for instability and coordination reasons even if one were too utilitarian /â long termist to care for normal reasons)
I think it would be good to accelerate âRisk evaluationâ in a sense that I think was defined really elegantly by Joe Carlsmith in âPaths and waystations in AI safetyâ [1]
building naturally from there, forecasting systems could be specifically applied to DAID and DPD; I know this is a little âouroborosâ to suggest but I think it works
Coordination-enabling:
movement building research and macro-strategy, AI-fueled activism, political coalition building, AI research into and tools for strengthening democracy
auto research into deliberative mini-publics, improved voting systems (eg. ranked choice, liquid democracy, quadratic voting, anti-gerrymandering solutions), secure digital voting platforms, improved checks and balances (eg. strong multi-stakeholder oversight, whistleblower protections, human rights), non-censorship oriented solutions to misinformation
Risk-targeting:
I know it is not the main thrust of âexistential securityâ, but I think it worth considering the potential for âabundant cognitionâ to welfare /â sentience research (eg. bio and AI). This seems really important from a lot of perspectives, for a lot of reasons:
AI Safety might be worse if the AIs are âdiscontentâ
we could lock in a future where most people are suffering terribly which would not count as existential security
it seems worthwhile to know if the AI workers are suffering ASAP for normal âavoid doing moral catastrophesâ reasons
we could unlock huge amounts of welfare or learn to avoid huge amount of pain; (cf. âhedoniumâ or the Far Out Initiative)
That said, I have not really considered the offense /â defense balance here. We may discover how to simulate suffering for much cheaper than pleasure or something horrendous like that. Or there might be info hazards. This space seems so high stakes and hard to chart.
Some mix:
Certain forms of monitoring and openly researching other peopleâs actions seem like a mix of epistemics and coordination. For example, I had listed some stuff about ie. AI for broadly OSINT-based investigative journalism, AI lab watch, legislator scorecards, and similar. These are kind of information for the sake of coordination.
I know I included some moonshots. This all depends on what AI systems we are talking about and what they are actually helpful with I guess. I would hate for EA to bet too hard on any of this stuff and accidentally flood the zone of key areas with LLM âslopâ or whatever.
Also, to state the obvious, there may be some risk of correlated exposure if you pin too much of your existential security with the crucial aid of unreliable, untrustworthy AIs. Maybe Hal 9000 isnât always the entity to trust with your most critical security.
Lots to think about here! Thanks!
Joe Carlsmith: âRisk evaluation tracks the safety range and the capability frontier, and it forecasts where a given form of AI development/âdeployment will put them.
Paradigm examples include:
evals for dangerous capabilities and motivations;
forecasts about where a given sort of development/âdeployment will lead (e.g., via scaling laws, expert assessments, attempts to apply human and/âor AI forecasting to relevant questions, etc);
general improvements to our scientific understanding of AI
structured safety cases and/âor cost-benefit analyses that draw on this information.â
I really enjoyed this post! In general, I am a big fan of efforts to improve our collective decision making in a way that builds off of existing democracy/âvoting schemes. Iâm a big fan of approval voting for example.
This was the first Iâve heard of election by jury. I checked out your website a bit too. Cool stuff, I really like the idea of using sampling to help mitigate other issues including the overhead costs of having a well informed citizenry on every issue.
Thanks!
I have a few questions about the space of EA communities.
You mention
Projects that build communities focused on impartial, scope-sensitive and ambitious altruism.
as in scope. I am curious what existing examples you have of communities that place emphasis on these values aside from the core âEAâ brand?
I know that GWWC kind of exists as itâs own community independent of âEAâ to ~some extent, but honestly I am unclear to what extent. Also, I guess LessWrong and the broader rationality-cinematic-universe might kind of fit here too, but realistically whenever scope sensitive altruism is the topic of discussion on LessWrong an EA Forum cross-post is likely. Are there any big âimpartial, scope-sensitive and ambitious altruismâ communities I am missing? I know there are several non-profits independently working on charity evaluation and that sort of thing, but I am not very aware of distinct âcommunitiesâ per say?
Some of my motivation for asking is that I actually think there is a lot of potential when it comes to EA-esque communities that arenât actually officially âEAâ or âEA Groupsâ. In particular, I am personally interested in the idea of local EA-esque community groups with a more proactive focus on fellowship, loving community, social kindness/âfraternity, and providing people a context for profound/âmeaningful experiences. Still championing many EA-values (scope-sensitivity, broad moral circles, proactive ethics) and EA tools (effective giving, research oriented, and ethics-driven careers), but in the context of a group which is a shade or two more like churches, humanist associations, and the Sunday Assembly and a shade or two less like Rotary Clubs or professional groups.
Thatâs just one idea, but Iâm really trying to ask about the broader status of EA-diaspora communities /â non-canonically âEAâ community groups under EAIF? I would like to more clearly understand what the canonical âstewards of the EA brandâ in CEA and the EAIF have in mind for the future of EA groups and the movement as a whole? What does success look like here; what are these groups trying to be /â blossom into? And to the extent that my personal vision for âthe future of EAâ is different, is a clear-break /â diaspora the way to go?
Thanks!
Iâve seen EA meditation, EA bouldering, EA clubbing, EA whatever. Orgs seem to want everyone and the janitor to be âalignedâ. Everyoneâs dating each other. It seems that weâre even afraid of them.
I am not in the Bay Area or London, so I guess Iâm maybe not personally familiar with the full extent of what youâre describing, but there are elements of this that sound mostly positive to me.
Like, of course, it is possible to overemphasize the importance of culture fit and mission alignment when making hiring decisions. It seems like a balance and depends on the circumstance and I donât have much to say there.
As far as the extensive EA fraternizing goes, that actually seems mostly good. Like, to the extent that EA is a âcommunityâ, it doesnât seem surprising or bad that people are drawn to hang out. Church groups do that sort of thing all the time for example. People often like hanging out with others with shared values, interests, experiences, outlook, and cultural touchstones. Granted, there are healthy and unhealthy forms of this.
Iâm sure thereâs potential for things to get inappropriate and for inappropriate power dynamics to occur when it comes to ambiguous overlap between professional contexts, personal relationships, and shared social circles. At their best though, social communities can provide people a lot of value and support.
Why is âEA clubbingâ a bad thing?
I think the money goes a lot further when it comes to helping non human animals then when it comes to helping humans.
I am generally pretty bought into the idea that non human animals also experience pleasure/âsuffering and I care about helping them.
I think it is probably good for the long term trajectory of society to have better norms around the casual cruelty and torture inflicted on non-human animals.
On the other hand, I do think there are really good arguments for human to human compassion and the elimination of extreme poverty. I am very in favor of that sort of thing too. GiveDirectly in particular is one of my favorite charities just because of the simplicity, compassion, and unpretentiousness of the approach.
Animal welfare wins my vote not because I disfavor human to human welfare, but just because I think that the same amount of resources can go a lot further in helping my non human friends.
If âhow do you deal with itâ means âhow do you convince yourself it is false, or that things some EA orgs are contributing to are still okay given itâ, I donât think this is a useful attitude to have towards troubling truths.
Well said and important :)
I donât really understand this stance, could you explain what you mean here?
Like Sammy points out with the Hitler example, it seems kind of obviously counterproductive/ânegative to âsave a human who was then going to go torture and kill a lot of other humansâ.
Would you disagree with that? Or is the pluralism you are suggesting here specifically between viewpoints that suggest animal suffering matters and viewpoints that donât think it matters?
As I understand worldview diversification stances, the idea is something like: if you are uncertain about whether animal welfare matters, then you can take a portfolio approach where with some fraction of resources, you try to increase human welfare at the cost of animals and with a different fraction of resources you try to increase animal welfare. The hope being that this nets out to positive in âworldâs where non-human animals matterâ and âworldâs where only humans matterâ.
Are you suggesting something like that or is there a deeper rule against ânot concluding that the effects of other peopleâs lives are net negativeâ when considering the second order effects of whether to save them that you are pointing to?
Note that the cost-effectiveness of epidemic/âpandemic preparedness I got of 0.00236 DALY/â$ is still quite high.
Point well-taken.
I appreciate you writing and sharing those posts trying to model and quantify the impact of x-risk work and question the common arguments given for astronomical EV.
I hope to take a look at those more in depth some time and critically assess what I think about them. Honestly, I am very intrigued by engaging with well informed disagreement around the astronomical EV of x-risk focused approaches. I find your perspective here interesting and I think engaging with it might sharpen my own understanding.
:)
Interesting! This is a very surprising result to me because I am mostly used to hearing about how cost effective pandemic prevention is and this estimate seems to disagree with that.
Shouldnât this be a relatively major point against prioritizing biorisk as a cause area? (at least w/âo taking into account strong long termism and the moral catastrophe of extinction)
Fictional Characters:
I would say I agree that fictional characters arenât moral patients. Thatâs because I donât think the suffering/âpleasure of fictional characters is actually experienced by anyone.
I take your point that you donât think that the suffering/âpleasure portrayed by LLMs is actually experienced by anyone either.
I am not sure how deep I really think the analogy is between what the LLM is doing and what human actors or authors are doing when they portray a character. But I can see some analogy and I think it provides a reasonable intuition pump for times when humans can say stuff like âIâm sufferingâ without it actually reflecting anything of moral concern.
Trivial Changes to Deepnets:
I am not sure how to evaluate your claim that only trivial changes to the NN are needed to have it negate itself. My sense is that this would probably require more extensive retraining if you really wanted to get it to never role-play that it was suffering under any circumstances. This seems at least as hard as other RLHF âguardrailsâ tasks unless the approach was particularly fragile/âhacky.
Also, Iâm just not sure I have super strong intuitions about that mattering a lot because it seems very plausible that just by âshifting a trivial mass of chemicals aroundâ or ârearranging a trivial mass of neuronsâ somebody could significantly impact the valence of my own experience. Iâm just saying, the right small changes to my brain can be very impactful to my mind.
My Remaining Uncertainty:
I would say I broadly agree with the general notion that the text output by LLMs probably doesnât correspond to an underlying mind with anything like the sorts of mental states that I would expect to see in a human mind that was âoutputting the same textâ.
That said, I think I am less confident in that idea than you and I maybe donât find the same arguments/âintuitions pumps as compelling. I think your take is reasonable and all, I just have a lot of general uncertainty about this sort of thing.
Part of that is just that I think it would be brash of me in general to not at least entertain the idea of moral worth when it comes to these strange masses of âbrain-tissue inspired computational stuffâ which are totally capable of all sorts of intelligent tasks. Like, my prior on such things being in some sense sentient or morally valuable is far from 0 to begin with just because that really seems like the sort of thing that would be a plausible candidate for moral worth in my ontology.
And also I just donât feel confident at all in my own understanding of how phenomenal consciousness arises /â what the hell it even is. Especially with these novel sorts of computational pseudo-brains.
So, idk, I do tend to agree that the text outputs shouldnât just be taken at face value or treated as equivalent in nature to human speech, but I am not really confident that there is ânothing going onâ inside the big deepnets.
There are other competing factors at this meta-uncertainty level. Maybe Iâm too easily impressed by regurgitated human text. I think there are strong social /â conformity reasons to be dismissive of the idea that theyâre conscious. etc.
Usefulness as Moral Patients:
I am more willing to agree with your point that they canât be âusefullyâ moral patients. Perhaps you are right about the ârole-playingâ thing and whatever mind might exist in GPT, produces the text stream more as a byproduct of whatever it is concerned about than as a âtrue monologue about itselfâ. Perhaps the relationship it has to its text outputs is analogous to the relationship an actor has to a character they are playing at some deep level. I donât personally find âsimulatorsâ analogy compelling enough to really think this, but I permit the possibility.
We are so ignorant about nature of a GPTsâ minds that perhaps there is not much that we can really even say about what sorts of things would be âgoodâ or âbadâ with respect to them. And all of our uncertainty about whether/âwhat they are experiencing, almost certainly makes them less useful as moral patients on the margin.
I donât intuitively feel great about a world full of nothing, but servers constantly prompting GPTs with âyou are having fun, you feel greatâ just to have them output âyayâ all the time. Still, I would probably rather have that sort of world than an empty universe. And if someone told me they were building a data center where they would explicitly retrain and prompt LLMs to exhibit suffering-like behavior/âtext outputs all the time, I would be against that.
But I can certainly imagine worlds in which these sorts of things wouldnât really correspond to valenced experience at all. Maybe the relationship between a NNâs stream of text and any hypothetical mental processes going on inside them is so opaque and non-human that we could not easily influence the mental processes in ways that we would consider good.
LLMs Might Do Pretty Mind-Like Stuff:
On the object level, I think one of the main lines of reasoning that makes me hesitant to more enthusiastically agree that the text outputs of LLMs do not correspond to any mind is my general uncertainty about what kinds of computation are actually producing those text outputs and my uncertainty about what kinds of things produce mental states.
For one thing, it feels very plausible to me that a ânext token predictorâ IS all you would need to get a mind that can experience something. Prediction is a perfectly respectable kind of thing for a mind to do. Predictive power is pretty much the basis of how we judge which theories are true scientifically. Also, plausibly itâs a lot of what our brains are actually doing and thus potentially pretty core to how our minds are generated (cf. predictive coding).
The fact that modern NNs are âmere next token predictorsâ on some level doesnât give me clear intuitions that I should rule out the possibility of interesting mental processes being involved.
Plus, I really donât think we have a very good mechanistic understanding of what sorts of âtechniquesâ the models are actually using to be so damn good at predicting. Plausibly non of the algorithms being implemented or âthings happeningâ are of any similarity to the mental processes I know and love, but plausibly there is a lot of âmind-likeâ stuff going on. Certainly brains have offered design inspiration, so perhaps our default guess should be that âmind-stuffâ is relatively likely to emerge.
Can Machines Think:
The Imitation Game proposed by Turing attempts to provide a more rigorous framing for the question of whether machines can âthinkâ.
I find it a particularly moving thought experiment if I imagine that the machine is trying to imitate a specific loved one of mine.
If there was a machine that could nail the exact I/âO patterns that my girlfriend, then I would be inclined to say that whatever sort of information processing occurs in my girlfriendâs brain to create her language capacity must also be happening in the machine somewhere.
I would also say that if all of my girlfriendâs language capacity were being computed somewhere, then it is reasonably likely that whatever sorts of mental stuff goes on that generates her experience of the world would also be occurring.
I would still consider this true without having a deep conceptual understanding of how those computations were performed. Iâm sure I could even look at how they were performed and not find it obvious in what sense they could possibly lead to phenomenal experience. After all, that is pretty much my current epistemic state in regards to the brain, so I really shouldnât expect reality to âhand it to me on a platterâ.
If there was a machine that could imitate a plausible human mind in the same way, should I not think that it is perhaps simulating a plausible human in some way? Or perhaps using some combination of more expensive âbrain/âmind-likeâ computations in conjunction with lazier linguistic heuristics?
I guess Iâm saying that there are probably good philosophical reasons for having a null hypothesis in which a system which is largely indistinguishable from a human mind should be treated as though it is doing computations equivalent to a human mind. Thatâs the pretty much same thing as saying it is âsimulatingâ a human mind. And that very much feels like the sort of thing that might cause consciousness.
I appreciate you taking the time to write out this viewpoint. I have had vaguely similar thoughts in this vein. Tying it into Janusâs simulators and the stochastic parrot view of LLMs was helpful. I would intuitively suspect that many people would have an objection similar to this, so thanks for voicing it.
If I am understanding and summarizing your position correctly, it is roughly that:
The text output by LLMs is not reflective of the state of any internal mind in a way that mirrors how human language typically reflects the speakerâs mind. You believe this is implied by the fact that the LLM cannot be effectively modeled as a coherent individuals with consistent opinions; there is not actually a single âAI assistantâ under Claudeâs hood. Instead, the LLM itself is a difficult to comprehend âshoggothâ system and that system sometimes falls into narrative patterns in the course of next token prediction which cause it to produce text in which characters/ââmasksâ are portrayed. Because the characters being portrayed are only patterns that the next token predictor follows in order to predict next tokens, it doesnât seem plausible to model them as reflecting an underlying mind. They are merely âimages of peopleâ or something; like a literary character or one portrayed by an actor. Thus, even if one of the âmasksâ says something about itâs preferences or experiences, this probably doesnât correspond to the internal states of any real, extant mind in the way that we would normally expect to be true when humans talk about their preferences or experiences.
Is that a fair summation/âreword?
Adjacent to this point about how we could improve EA communication, I think it would be cool to have a post that explores how we might effectively use, like, Mastodon or some other method of dynamic, self-governed federation to get around this issue. I think this issue goes well beyond just the EA forum in some ways lol.
Good suggestion! Happy Ramadan! <3
Just for the sake of feedback, I think this makes me personally less inclined to post the ideas and drafts I have been toying with because it makes me feel like they are going to be completely steamrolled by a flurry of posts by people with higher status than me and it wouldnât really matter what I said.
I donât know who your target demo here is and it sounds like âflurry of posts by high status individualsâ might have been your main intention anyways. However, please note, that this doesnât necessarily help you very much if you are trying to cultivate more outsider perspectives.
In any case, youâre probably right that this will lead to more discussion and I am interested to see how it shakes out. I hope youâll write up a review post or something to summarize how the event went because itâs going to be hard to follow that many posts about different topics and the corresponding they each generate.
I am very unclear on why research that involves game theory simulations seems dangerous to you. I think Iâm ignorant of something leading you to this conclusion. Would you be willing to explain your reasoning or send me a link to something so I can better understand where youâre coming from?
Could you expound on this or maybe point me in the right direction to learn why this might be?
I tend to agree with the intuition that s-risks are unlikely because they are a small part of possibility space and that nobody is really aiming for them. I can see a risk that systems trained to produce eudaimonia will instead produce â1 x eudaimonia, but I canât see how that justifies thinking that astronomic bad is more likely than astronomic good. Surely a random sign flip is less likely than not.
Sure thing! I donât think itâll be all that polished or comprehensive since it is mostly intended to help me straighten out my reasoning, but I would be more than happy to share it.
Thank you for the survey info! I was favorably surprised by some of those results.
Thank you so much! This is exactly the sort of thing I am looking for. Iâm glad there is high quality work like this being done to advance strategic clarity surrounding TAI and I appreciate you sharing your draft.
Thanks for linking âLine Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Modelsâ. Also, I agree with:
That comparison seems simplistic and inapt for at least a few reasons. That does seem like pretty âtrust me broâ justification for the intelligence explosion lol. Granted, I only listened to the accompanying podcast, so I canât speak too much to the paper.
Still, I am of two minds. I still buy into a lot of the premise of âPreparing for the Intelligence Explosionâ. I find the idea of getting collectively blind-sighted by rapid, uneven AI progress ~eminently plausible. There didnât even need to be that much of a fig leaf.
Donât get me wrong, I am not personally very confident in âexpert level AI researcher for arbitrary domainsâ w/âi the next few decades. Even so, it does seem like the sort of thing worth thinking about and preparing about.
From one perspective, AI coding tools are just recursive self improvement gradually coming online. I think I understand some of the urgency, but I appreciate the skepticism a lot too.
Preparing for an intelligence explosion is a worthwhile thought experiment at least. It seems probably good to know what we would do in a world with âa lot of powerful AIâ given that we are in a world where all sorts of people are trying to research/âmake/âsell ~âa lot of powerful AIâ. Like just in case, at least.
I think I see multiple sides. Lots to think about.