I’ve left AI Impacts; I’m looking for jobs/projects in AI governance. I have plenty of runway; I’m looking for impact, not income. Let me know if you have suggestions!
(Edit to clarify: I had a good experience with AI Impacts.)
PSA about credentials (in particular, a bachelor’s degree): they’re important even for working in EA and AI safety.
When I dropped out of college to work on AI safety, I thought credentials are mostly important as evidence-of-performance, for people who aren’t familiar with my work, and are necessary in high-bureaucracy institutions (academia, government). It turns out that credentials are important—for working with even many people who know you (such that the credential provides no extra evidence) and are willing to defy conventions—for rational, optics-y reasons. It seems even many AI governance professionals/orgs are worried (often rationally) about appearing unserious by hiring or publicly-collaborating-with the uncredentialed, or something. Plus irrationally-credentialist organizations are very common/important, and may even comprise a substantial fraction of EA jobs and x-risk-focused AI governance jobs (which I expected to be more convention-defying), and sometimes an organization/institution is credentialist even when it’s led by weird AI safety people (those people operate under constraints).
Disclaimer: the evidence-from-my-experiences for these claims is pretty weak. This point’s epistemic status is more considerations + impressions from a few experiences than facts/PSA.
Upshot: I’d caution people against dropping out of college to increase impact unless they have a great plan.
(Edit to clarify: this paragraph is not about AI Impacts — it’s about everyone else.)
Might be out-of-scope for this shortform, but have you considered/are you able to go back to Williams? My impression is that you did well there and (unlike many EAs) you enjoyed your experience so it’d be less costly than for many.
If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)
The main caveats to my agreement are:
From a deontology-style perspective, I think there is a pretty good case for trying to do something reasonable on AI welfare. Minimally, we should try to make sure that AIs consent to their current overall situation insofar as they are capable of consenting. I don’t put a huge amount of weight on deontology, but enough to care a bit.
As you discuss in the sibling comment, I think various interventions like paying AIs (and making sure AIs are happy with their situation) to reduce takeover risk are potentially compelling and they are very similar to AI welfare interventions. I also think there is a weak decision theory case that blends in with deontology case from the prior bullet.
I think that there is a non-trivial chance that AI welfare is a big and important field at the point when AIs are powerful regardless of whether I push for such a field to exist. In general, I would prefer that important fields related to AI have better more thoughtful views. (Not with any specific theory of change, just a general heuristic.)
Aligned AI concentrates power in a small group of humans
AI technology allows them to dictate aspects of the future / cause some “lock in” if they want. That’s because:
These humans control the AI systems that have all the hard power in the world
Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
Those AI systems will continue to obey whatever instructions they are given indefinitely
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
(Also, the decision makers could pick a future which isn’t very good in other ways.)
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they’re more likely to take AI welfare into account when making decisions.
I’d be interested in which step of this story seems implausible to you—is it about AI technology making “lock in” possible?
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
This would be very weird: it requires that either the value-setters are very rushed or [...]
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have writtenbefore.)
This would be very weird: it requires that either the value-settlers [...] or that they have lots of time to consult with superintelligent advisors but still make the wrong choice.
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
(Minor point: in an unstable multipolar world, it’s not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
you could imagine a state letting loose this robotic machinery that replicates at a very rapid rate. If it doubles 12 times in a year, you have 4,096 times as much. By the time other powers catch up to that robotic technology, if they were, say, a year or so behind, it could be that there are robots loyal to the first mover that are already on all the asteroids, on the Moon, and whatnot. And unless one tried to forcibly dislodge them, which wouldn’t really work because of the disparity of industrial equipment, then there could be an indefinite and permanent gap in industrial and military equipment.
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
I think my worry is people who don’t think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the “right” decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the “right” decision.
That is, assuming the AIs are intent aligned, they’ll only help you in the ways you want to be helped:
Thoughtful people might realise the importance of getting the decision right, and might ask “please help me to get this decision right” in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account.
But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the “right” decision or in what they would endorse on reflection.
I do hope that people won’t be so thoughtless as to impose their vision of the future without seeking advice, but I’m not confident.
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:
Meta-uncertainty:
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
But thinking in terms of probabilities over probabilities is sometimes useful, e.g. you have a probability distribution over possible worlds/models and those worlds/models are probabilistic
Unstable beliefs about stuff like AI timelines in the sense of I’d be pretty likely to say something pretty different if you asked tomorrow
Instability in the sense of being likely to change beliefs if you thought about it more is fine; fluctuating predictably (dutch-book-ably) is not
Ethics:
Axiologies besides ~utilitarianism
Possibly I’m actually noticing sloppy reasoning about how to go from axiology to decision procedure, possibly including just not taking axiology seriously
Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project
Cause prioritization:
Cause prioritization that doesn’t take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
Deciding in advance to boost a certain set of causes [what determines that set??], or a “portfolio approach” without justifying the portfolio-items
E.g. multiple CEA staff donate by choosing some cause areas and wanting to help in each of those areas
Related error: agent-relativity
Related error: considering difference from status quo rather than outcomes in a vacuum
Related error: risk-aversion in your personal contributions (much more egregious than risk-averse axiology)
Instead you should just argmax — find the marginal value of your resources in each cause (for your resources that can funge between causes), then use them in the best possible way
Intra-cause offsetting: if you do harm in area X [especially if it’s avoidable/unnecessary/intentional], you should fix your harm in that area, even if you could do more good in another area
Maybe very few of my friends actually believe this
Misc:
Not noticing big obvious problems with impact certificates/markets
Naively using calibration as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money
I think most of us understand the objection you can do better by just borrowing money at market rates — I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn’t liquidity-constrained). The bet only makes sense if you spend all your money before you’d have to pay back.
[Maybe something deep about donations; not sure]
[Maybe something about compensating altruists or compensating for roles often filled by altruists; not sure]
[Maybe something about status; not sure]
Possibly I’m wrong about which attitudes are common.
For now I’m just starting a list, not trying to be legible, much less change minds. I know I haven’t explained my views.
Edit: I’m sharing controversial beliefs, without justification and with some framed provocatively. If one of these views makes you think worse of me to a nontrivial degree, please ask for elaboration; maybe there’s miscommunication or it’s more reasonable than it seems. Edit 2: there are so many comments; I may not respond to requests-for-elaboration but will at least notice them as a bid-for-elaboration-at-some-point.
(meta musing) The conjunction of the negations of a bunch of statements seems a bit doomed to get a lot of disagreement karma, sadly. Esp. if the statements being negated are “common beliefs” of people like the ones on this forum.
I agreed with some of these and disagreed with others, so I felt unable to agreevote. But I strongly appreciated the post overall so I strong-upvoted.
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
This is just straightforwardly correct statistics. For example, ask a true bayesian to estimate the outcome of flipping a coin of unknown bias, and they will construct a probability distribution of coin flip probabilites, and only reduce this to a single probability when forced to make a bet. But when not taking a bet, they should be doing updates on the distribution, not the final estimate. (I’m pretty sure this is in fact the only logical way to do a bayesian update for the problem).
And why are we stating probabilities anyway? The main reason seems to be to quantify and communicate our beliefs. But if my “25% probability ” comes from a different distribution to your “25% probability ”, we may appear to be in agreement when in fact our worldviews differ wildly. I think giving credence intervals over probabilities is strictly better than this.
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
When I do this, it’s because I’m unable or unwilling to assign a probability distribution over the probabilities, so it won’t reduce to simple (precise) probabilities. Actually, in general, I think precise probabilities are epistemically unjustified (e.g. Schoenfield, 2012, section 3), but I’m willing to use more or less precise probabilities depending on the circumstances.
Unstable beliefs about stuff like AI timelines in the sense of I’d be pretty likely to say something pretty different if you asked tomorrow
I’m not sure if I’d claim to have such unstable beliefs myself, but if you’re trying to be very precise with very speculative, subjective and hard-to-specifically-defend probabilities, then I’d imagine they could be very unstable, and influenced by things like your mood, e.g. optimism and pessimism bias. That is, unless you commit to your credences even if you’d had formed different ones if you had started from scratch or you make arbitrary choices in forming them that could easily have gone differently. You might weigh the same evidence or arguments differently from one day to the next.
I’d guess most people would also have had at least slightly different credences on AI timelines if they had seen the same evidence or arguments in a different order, or were in a different mood when they were forming their credences or building models, or for many other different reasons. Some number or parameter choices will come down to intuition, and intuition can be unstable.
fluctuating predictably (dutch-book-ably) is not
I don’t think people are fluctuating predictably (dutch-book-ably). How exactly they’d change their minds or even the direction is not known to them ahead of time.
(But maybe you could Dutch book people by predicting their moods and so optimism and pessimism bias?)
Some people say things like “my doom-credence fluctuates between 10% and 25% day to day”; this is dutch-book-able and they’d make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.
This is dutch-book-able only if there is no bid-ask spread. A rational choice in this case would be to have a very wide bid-ask spread. E.g. when Holden Karnofsky writes that his P(doom) is between 10% and 90%, I assume he would bet for doom at 9% or less, bet against doom at 91% or more, and not bet for 0.11<p<0.89. This seems a very rational choice in a high-volatility situation where information changes extremely quickly. (As an example, IIRC the bid-ask spread in financial markets increases right before earnings are released).
(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don’t really want to get into this now.)
Hmm, okay. So, for example, when they’re below 15%, you bet that it will happen at odds matching 15% against them, and when they’re above 20%, you bet that it won’t happen at 20% against them. And just make sure to size the bets right so that if you lose one bet, your payoff is higher in the other, which you’d win. They “give up” the 15-20% range for free to you.
Still, maybe they just mean to report the historical range or volatility of their estimates? This would be like reporting the historical volatility of a stock. They may not intend to imply, say, that they’ll definitely fall below 15% at some point and above 20% at another.
Plus, picking one way to average may seem unjustifiably precise to them. The average over time is one way, but another is the average over relatively unique (clusters) of states of mind, e.g. splitting weight equally between good, ~neutral and bad moods, averages over possible sets of value assignments for various parameters. There are many different reasonable choices they can make, all pretty arbitrary.
Thank you for writing this. I share many of these, but I’m very uncertain about them.
Here it is:
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
I think this is rational, I think of probabilities in terms of bets and order books. I think this is close to my view, and the analogy of financial markets is not irrelevant.
Unstable beliefs about stuff like AI timelines in the sense of I’d be pretty likely to say something pretty different if you asked tomorrow
Changing literally day-to-day seems extreme, but month-to-month seems very reasonable given the speed of everything that’s happening, and it matches e.g. the volatility of NVIDIA stock price.
Axiologies besides ~utilitarianism
To me, “utilitarianism” seems pretty general, as long as you can arbitrarily define utility and you can arbitrarily choose between Negative/Rule/Act/Two-level/Total/Average/Preference/Classical utilitarianism. I really liked this section of a recent talk by Toby Ord (Starting from “It starts by observing that the three main traditions in Western philosophy each emphasize a different focal point:”). (I also don’t know if axiology is the right word for what we want to express here, we might be talking past each other)
Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
I mostly agree with you, but second order effects seem hard to evaluate and both costs and benefits are so minuscule (and potentially negative) that I find it hard to do a cost-benefit-analysis.
Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project
I agree with you, but for some it might be an instrumentally useful intentional framing. I think some use phrases like “[Personal flourishing] for its own sake, for the sake of existential risk.” (see also this comment for a fun thought experiment for average utilitarians, but I don’t think many believe it)
Cause prioritization that doesn’t take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
Some think the probability of extinction per century is only going up with humanity increasing capabilities, and are not convinced by arguments that we’ll soon reach close-to-speed-of-light travel which will make extinction risk go down. See also e.g. Why I am probably not a longtermist(except point 1). I find this very reasonable.
Deciding in advance to boost a certain set of causes [what determines that set??], or a “portfolio approach” without justifying the portfolio-items
I agree, I think this makes a ton of sense for people in community building that need to work with many cause areas (e.g. CEA staff, Peter Singer), but I fear that it makes less sense for private individuals maximizing their impact.
Not noticing big obvious problems with impact certificates/markets
I think many people notice big obvious problems with impact certificates/markets, but think that the current system is even worse, or that they are at least worth trying and improving, to see if at their best they can in some cases be better than the alternatives we have. The current funding systems also have big obvious problems. What big obvious problems do you think they are missing?
Naively using calibration as a proxy for forecasting ability
I agree with this, just want to mention that it seems better than a common alternative that I see: using LessWrong-sounding-ness/reputation as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money … I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn’t liquidity-constrained)
I somewhat agree with you, but I think that many people model it a bit like this: “I normally consume 100k/year, you give me 10k now so I will consume 110k this year, and if I lose the bet I will consume only 80k/year X years in the future”. But I agree that in practice the amounts are small and it doesn’t work for many reasons.
Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I’ll elaborate on all this in a top-level post.
Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.
veg(etari)anism as ethical rather than as a costly indulgence
Are you convinced the costs outweigh the benefits? It may be good for important instrumental reasons, e.g. reducing cognitive dissonance about sentience and moral weights, increasing the day-to-day salience of moral patients with limited agency or power (which could be an important share of those in the future), personal integrity or virtue, easing cooperation with animal advocates (including non-consequentialist ones), maybe health reasons.
Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren’t cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.
Or me. I’m veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.
I certainly haven’t done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I’d do the cost-benefit analysis, and do things like:
Eat meat that would otherwise go to waste (when that wouldn’t increase anticipated demand for meat in the future)
Try to reduce others’ meat consumption, and try to reduce the supply of meat or improve the lives of farmed animals, when that’s more cost-effective than personal veg(etari)anism
Notice whether eating meat would substantially boost my health and productivity, and go back to eating meat if so
I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don’t notice this.
Written quickly, maybe hard-to-parse and imprecise.
Strong upvoted and couldn’t decide whether to disagreevote or not. I agree with the points you list under meta-uncertainty and your point on naively using calibration as a proxy for forecasting ability + thinking you can bet on the end of the world by borrowing money. I disagree with your thoughts on ethics (I’m sympathetic to Zvi’s writing on EAs confusing the map for the territory).
I’m not sure what would be the best thing since I don’t remember there being a particular post about this. However, he talks about it in his book review for Going Infinite and I also like his post on Altruism is Incomplete. Lots of people I know find his writing confusing though and it’s not like he’s rigorously arguing for something. When I agree with Zvi, it’s usually because I have had that belief in the back of my mind for a while and him pointing it out makes it more salient, rather than because I got convinced by a particular argument he was making.
Deciding in advance to boost a certain set of causes [what determines that set??], or a “portfolio approach” without justifying the portfolio-items
(Not totally sure what you mean here.) I think the portfolio items are justified on the basis of distinct worldviews, which differ in part based on their normative commitments (e.g. theories of welfare like hedonism or preference views, moral weights, axiology, decision theory, epistemic standards, non-consequentialist commitments) across which there is no uniquely justified universal common scale. People might be doing this pretty informally or deferring, though.
Intra-cause offsetting: if you do harm in area X, you should fix your harm in that area, even if you could do more good in another area
I think this can make sense if you have imprecise credences or normative uncertainty (for which there isn’t a uniquely justified universal common scale across views). Specifically, if you’re unable to decide whether action A does net good or net harm (in expectation), because it does good for cause X and harm for cause Y, and the two causes are too hard to compare, it might make sense to offset. Portfolios can be (more) robustly positive than the individual acts. EDIT: But maybe you find this too difference-making?
It takes like 20 hours of focused reading to get basic context on AI risk and threat models. Once you have that, I feel like you can read everything important in x-risk-focused AI policy in 100 hours. Same for x-risk-focused AI corporate governance, AI forecasting, and macrostrategy.
[Edit: read everything important doesn’t mean you have nothing left to learn; it means something like you have context to appreciate ~all papers, and you can follow ~all conversations in the field except between sub-specialists, and you have the generators of good overviews like 12 tentative ideas for US AI policy.]
Am I wrong?
Actually yes, I’m imagining going back and speedrunning learning; if you’re not an expert then you’re much worse at (1) figuring out what to prioritize reading and (2) skimming. But still, 300 hours each, or 200 with a good reading list, or 150 with a great reading list.
This is wild. Normal fields require more like 10,000 hours engagement before you reach the frontier, and much more to read everything important. Right?
Why aren’t more people at the frontier in these four areas?
Normal fields have textbooks and syllabi and lit reviews. Those are awesome for learning quickly. We should have better reading lists. I should make reading lists.
My opportunity cost is high for several weeks; I’ll plan to try this in December. I should be able to make fine 100-hour reading lists on these four topics in 1 day each, or good ones in a week each.
I will be tempted to read too much stuff I haven’t already read. (But I should skim anything I haven’t read in e.g. https://course.aisafetyfundamentals.com/governance.) And I will have the curse of knowledge regarding prerequisites/context and what’s-hard-to-understand. Shrug.
Maybe I can just get someone else to make great reading lists...
Why don’t there exist better reading lists / syllabi, especially beyond introductory stuff?
A reading list will be out of date in 6 months. Hmm. Maybe updating it wouldn’t actually be that hard?
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone? Would they be helpful if they got more attention? I almost never know who uses them. If I did know, talking to those people might be helpful.
If I actually try to make great/canonical AI governance reading lists, I should:
Check out all the existing reading lists: public ones + see private airable + student fellowships on governance (Harvard/MIT/Stanford) + reading lists given to new GovAI fellows or IAPS staff
Ask various people for advice + input + feedback: Mauricio, Michael, Matthijs, David, AISF folks, various slacks; plus experts on various particular topics like “takeoff speed”
Think about target audience. Actually talk to people in potential target audiences.
I don’t know whether alignment is similar. I suspect alignment has a lack of reading lists too.
The lack of lists of (research) project ideas (not to mention research agendas) in AI safety is even worse than the lack of reading lists. Can I fix that?
[Check out + talk to people who run] some of: ERA, CHERI, PIBBSS, AI safety student groups (Harvard/MIT/Stanford), AISF, SPAR, AI Safety Camp, Alignment Jam, AI Safety Hubs Labs, GovAI fellowship (see private docs “GovAI Fellowship—Research project ideas” and “GovAI Summer Fellowship Handbook”), MATS, Astra
Did AI Safety Ideas try and fail to solve this problem? Talk to Esben?
Look for other existing lists (public and private)
Ask various relevant researchers & orgs for (lists of) project ideas?
For most AI governance researchers, I don’t know what they’re working on. That’s really costly and feels like it should be cheap to fix. I’m aware of one attempt to fix this; it failed and I don’t understand why.
I disagree-voted because I feel like I’ve done much more than 100-hours of reading on AI Policy (including finishing the AI Safety Fundamentals Governance course) and still have a strong sense there’s a lot I don’t know, and regularly come across new work that I find insightful. Very possibly I’m prioritising reading the wrong things (and would really value a reading list!) but thought I’d share my experience as a data point.
The technical intro fellowship curriculum. It’s structured as a 7-week reading group with ~1 hour of reading per week. It’s is based off of BlueDot’s AISF and the two curricula have co-evolved (we exchange ideas with BlueDot ~semesterly); a major difference is that the HAIST curriculum is significantly abridged.
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone?
For what it’s worth, I found your “AI policy ideas: Reading list” and “Ideas for AI labs: Reading list” helpful,[1] and I’ve recommended the former to three or four people. My guess would be that these reading lists have been very helpful to a couple or a few people rather than quite helpful to lots of people, but I’d also guess that’s the right thing to be aiming for given the overall landscape.
Why don’t there exist better reading lists / syllabi, especially beyond introductory stuff?
I expect there’s no good reason for this, and that it’s simply because it’s nobody’s job to make such reading lists (as far as I’m aware), and the few(?) people who could make good intermediate-to-advanced level readings lists either haven’t thought to do so or are too busy doing object-level work?
Helpful in the sense of: I read or skimmed the readings in those lists that I hadn’t already seen, which was maybe half of them, and I think this was probably a better use of my time than the counterfactual.
Because my job is very time-consuming, I haven’t spent much time trying to understand the state of the art in AI risk. If there was a ready-made reading list I could devote 2-3 hours per week to, such that it’d take me a few months to learn the basic context of AI risk, that’d be great.
An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.
A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.
I’ve left AI Impacts; I’m looking for jobs/projects in AI governance. I have plenty of runway; I’m looking for impact, not income. Let me know if you have suggestions!
(Edit to clarify: I had a good experience with AI Impacts.)
PSA about credentials (in particular, a bachelor’s degree): they’re important even for working in EA and AI safety.
When I dropped out of college to work on AI safety, I thought credentials are mostly important as evidence-of-performance, for people who aren’t familiar with my work, and are necessary in high-bureaucracy institutions (academia, government). It turns out that credentials are important—for working with even many people who know you (such that the credential provides no extra evidence) and are willing to defy conventions—for rational, optics-y reasons. It seems even many AI governance professionals/orgs are worried (often rationally) about appearing unserious by hiring or publicly-collaborating-with the uncredentialed, or something. Plus irrationally-credentialist organizations are very common/important, and may even comprise a substantial fraction of EA jobs and x-risk-focused AI governance jobs (which I expected to be more convention-defying), and sometimes an organization/institution is credentialist even when it’s led by weird AI safety people (those people operate under constraints).
Disclaimer: the evidence-from-my-experiences for these claims is pretty weak. This point’s epistemic status is more considerations + impressions from a few experiences than facts/PSA.
Upshot: I’d caution people against dropping out of college to increase impact unless they have a great plan.
(Edit to clarify: this paragraph is not about AI Impacts — it’s about everyone else.)
Might be out-of-scope for this shortform, but have you considered/are you able to go back to Williams? My impression is that you did well there and (unlike many EAs) you enjoyed your experience so it’d be less costly than for many.
I appreciate it; I’m pretty sure I have better options than finishing my Bachelor’s; details are out-of-scope here but happy to chat sometime.
My position on “AI welfare”
If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it’s worth >>10^60 happy human lives in expectation. Digital minds are super important.
Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
10^20 isn’t even a feather in the scales when 10^60 is at stake.
“Lock-in” [edit: of “AI welfare” trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn’t need to happen yet; there will be plenty of subjective time for it before the von Neumann probes’ goals are set.
Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
[Edit: the questionable part of this is #4.]
I basically agree with this with some caveats. (Despite writing a post discussing AI welfare interventions.)
I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)
The main caveats to my agreement are:
From a deontology-style perspective, I think there is a pretty good case for trying to do something reasonable on AI welfare. Minimally, we should try to make sure that AIs consent to their current overall situation insofar as they are capable of consenting. I don’t put a huge amount of weight on deontology, but enough to care a bit.
As you discuss in the sibling comment, I think various interventions like paying AIs (and making sure AIs are happy with their situation) to reduce takeover risk are potentially compelling and they are very similar to AI welfare interventions. I also think there is a weak decision theory case that blends in with deontology case from the prior bullet.
I think that there is a non-trivial chance that AI welfare is a big and important field at the point when AIs are powerful regardless of whether I push for such a field to exist. In general, I would prefer that important fields related to AI have better more thoughtful views. (Not with any specific theory of change, just a general heuristic.)
Why does “lock-in” seem so unlikely to you?
One story:
Assume AI welfare matters
Aligned AI concentrates power in a small group of humans
AI technology allows them to dictate aspects of the future / cause some “lock in” if they want. That’s because:
These humans control the AI systems that have all the hard power in the world
Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
Those AI systems will continue to obey whatever instructions they are given indefinitely
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
(Also, the decision makers could pick a future which isn’t very good in other ways.)
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they’re more likely to take AI welfare into account when making decisions.
I’d be interested in which step of this story seems implausible to you—is it about AI technology making “lock in” possible?
I agree this is possible, and I think a decent fraction of the value of “AI welfare” work comes from stuff like this.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
the likely U.S. presidential administration for the next four years
in this world, TAI has been nationalized
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
All recent U.S. presidents have been religious, for instance.
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
Relatedly, if illusionism is true, then welfare is a fully subjective problem.
(Minor point: in an unstable multipolar world, it’s not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered)
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
*From Carl Shulman’s recent 80k interview:
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
Interesting!
I think my worry is people who don’t think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the “right” decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the “right” decision.
That is, assuming the AIs are intent aligned, they’ll only help you in the ways you want to be helped:
Thoughtful people might realise the importance of getting the decision right, and might ask “please help me to get this decision right” in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account.
But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the “right” decision or in what they would endorse on reflection.
I do hope that people won’t be so thoughtless as to impose their vision of the future without seeking advice, but I’m not confident.
Briefly + roughly (not precise):
At some point we’ll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can’t easily make this intuition legible. (So I likely won’t reply to messages about this.)
Caveats:
I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they’re more likely to take catastrophic actions if we’re torturing them.
I haven’t tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
I don’t understand the decision theory mumble mumble argument; there might be something there.
(Other than that, it seems hard to tell a story about how “AI welfare” research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
FWIW, these motivations seem reasonably central to me personally, though not my only motivations.
Among your friends, I agree; among EA Forum users, I disagree.
Yes, I meant central to me personally, edited the comment to clarify.
This is very similar to my current stance.
Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:
Meta-uncertainty:
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
But thinking in terms of probabilities over probabilities is sometimes useful, e.g. you have a probability distribution over possible worlds/models and those worlds/models are probabilistic
Unstable beliefs about stuff like AI timelines in the sense of I’d be pretty likely to say something pretty different if you asked tomorrow
Instability in the sense of being likely to change beliefs if you thought about it more is fine; fluctuating predictably (dutch-book-ably) is not
Ethics:
Axiologies besides ~utilitarianism
Possibly I’m actually noticing sloppy reasoning about how to go from axiology to decision procedure, possibly including just not taking axiology seriously
Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project
Cause prioritization:
Cause prioritization that doesn’t take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
E.g. RP’s Cross-Cause Cost-Effectiveness Model doesn’t take the cosmic endowment seriously
Deciding in advance to boost a certain set of causes [what determines that set??], or a “portfolio approach” without justifying the portfolio-items
E.g. multiple CEA staff donate by choosing some cause areas and wanting to help in each of those areas
Related error: agent-relativity
Related error: considering difference from status quo rather than outcomes in a vacuum
Related error: risk-aversion in your personal contributions (much more egregious than risk-averse axiology)
Instead you should just argmax — find the marginal value of your resources in each cause (for your resources that can funge between causes), then use them in the best possible way
Intra-cause offsetting: if you do harm in area X [especially if it’s avoidable/unnecessary/intentional], you should fix your harm in that area, even if you could do more good in another area
Maybe very few of my friends actually believe this
Misc:
Not noticing big obvious problems with impact certificates/markets
Naively using calibration as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money
Many examples, e.g. How to place a bet on the end of the world
I think most of us understand the objection you can do better by just borrowing money at market rates — I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn’t liquidity-constrained). The bet only makes sense if you spend all your money before you’d have to pay back.
[Maybe something deep about donations; not sure]
[Maybe something about compensating altruists or compensating for roles often filled by altruists; not sure]
[Maybe something about status; not sure]
Possibly I’m wrong about which attitudes are common.
For now I’m just starting a list, not trying to be legible, much less change minds. I know I haven’t explained my views.
Edit: I’m sharing controversial beliefs, without justification and with some framed provocatively. If one of these views makes you think worse of me to a nontrivial degree, please ask for elaboration; maybe there’s miscommunication or it’s more reasonable than it seems. Edit 2: there are so many comments; I may not respond to requests-for-elaboration but will at least notice them as a bid-for-elaboration-at-some-point.
(meta musing) The conjunction of the negations of a bunch of statements seems a bit doomed to get a lot of disagreement karma, sadly. Esp. if the statements being negated are “common beliefs” of people like the ones on this forum.
I agreed with some of these and disagreed with others, so I felt unable to agreevote. But I strongly appreciated the post overall so I strong-upvoted.
This is just straightforwardly correct statistics. For example, ask a true bayesian to estimate the outcome of flipping a coin of unknown bias, and they will construct a probability distribution of coin flip probabilites, and only reduce this to a single probability when forced to make a bet. But when not taking a bet, they should be doing updates on the distribution, not the final estimate. (I’m pretty sure this is in fact the only logical way to do a bayesian update for the problem).
And why are we stating probabilities anyway? The main reason seems to be to quantify and communicate our beliefs. But if my “25% probability ” comes from a different distribution to your “25% probability ”, we may appear to be in agreement when in fact our worldviews differ wildly. I think giving credence intervals over probabilities is strictly better than this.
Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.
When I do this, it’s because I’m unable or unwilling to assign a probability distribution over the probabilities, so it won’t reduce to simple (precise) probabilities. Actually, in general, I think precise probabilities are epistemically unjustified (e.g. Schoenfield, 2012, section 3), but I’m willing to use more or less precise probabilities depending on the circumstances.
I’m not sure if I’d claim to have such unstable beliefs myself, but if you’re trying to be very precise with very speculative, subjective and hard-to-specifically-defend probabilities, then I’d imagine they could be very unstable, and influenced by things like your mood, e.g. optimism and pessimism bias. That is, unless you commit to your credences even if you’d had formed different ones if you had started from scratch or you make arbitrary choices in forming them that could easily have gone differently. You might weigh the same evidence or arguments differently from one day to the next.
I’d guess most people would also have had at least slightly different credences on AI timelines if they had seen the same evidence or arguments in a different order, or were in a different mood when they were forming their credences or building models, or for many other different reasons. Some number or parameter choices will come down to intuition, and intuition can be unstable.
I don’t think people are fluctuating predictably (dutch-book-ably). How exactly they’d change their minds or even the direction is not known to them ahead of time.
(But maybe you could Dutch book people by predicting their moods and so optimism and pessimism bias?)
Thanks.
Some people say things like “my doom-credence fluctuates between 10% and 25% day to day”; this is dutch-book-able and they’d make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.
This is dutch-book-able only if there is no bid-ask spread. A rational choice in this case would be to have a very wide bid-ask spread. E.g. when Holden Karnofsky writes that his P(doom) is between 10% and 90%, I assume he would bet for doom at 9% or less, bet against doom at 91% or more, and not bet for 0.11<p<0.89. This seems a very rational choice in a high-volatility situation where information changes extremely quickly. (As an example, IIRC the bid-ask spread in financial markets increases right before earnings are released).
(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don’t really want to get into this now.)
Hmm, okay. So, for example, when they’re below 15%, you bet that it will happen at odds matching 15% against them, and when they’re above 20%, you bet that it won’t happen at 20% against them. And just make sure to size the bets right so that if you lose one bet, your payoff is higher in the other, which you’d win. They “give up” the 15-20% range for free to you.
Still, maybe they just mean to report the historical range or volatility of their estimates? This would be like reporting the historical volatility of a stock. They may not intend to imply, say, that they’ll definitely fall below 15% at some point and above 20% at another.
Plus, picking one way to average may seem unjustifiably precise to them. The average over time is one way, but another is the average over relatively unique (clusters) of states of mind, e.g. splitting weight equally between good, ~neutral and bad moods, averages over possible sets of value assignments for various parameters. There are many different reasonable choices they can make, all pretty arbitrary.
Thank you for writing this. I share many of these, but I’m very uncertain about them.
Here it is:
I think this is rational, I think of probabilities in terms of bets and order books. I think this is close to my view, and the analogy of financial markets is not irrelevant.
Changing literally day-to-day seems extreme, but month-to-month seems very reasonable given the speed of everything that’s happening, and it matches e.g. the volatility of NVIDIA stock price.
To me, “utilitarianism” seems pretty general, as long as you can arbitrarily define utility and you can arbitrarily choose between Negative/Rule/Act/Two-level/Total/Average/Preference/Classical utilitarianism. I really liked this section of a recent talk by Toby Ord (Starting from “It starts by observing that the three main traditions in Western philosophy each emphasize a different focal point:”). (I also don’t know if axiology is the right word for what we want to express here, we might be talking past each other)
I mostly agree with you, but second order effects seem hard to evaluate and both costs and benefits are so minuscule (and potentially negative) that I find it hard to do a cost-benefit-analysis.
I agree with you, but for some it might be an instrumentally useful intentional framing. I think some use phrases like “[Personal flourishing] for its own sake, for the sake of existential risk.” (see also this comment for a fun thought experiment for average utilitarians, but I don’t think many believe it)
Some think the probability of extinction per century is only going up with humanity increasing capabilities, and are not convinced by arguments that we’ll soon reach close-to-speed-of-light travel which will make extinction risk go down. See also e.g. Why I am probably not a longtermist (except point 1). I find this very reasonable.
I agree, I think this makes a ton of sense for people in community building that need to work with many cause areas (e.g. CEA staff, Peter Singer), but I fear that it makes less sense for private individuals maximizing their impact.
I think many people notice big obvious problems with impact certificates/markets, but think that the current system is even worse, or that they are at least worth trying and improving, to see if at their best they can in some cases be better than the alternatives we have. The current funding systems also have big obvious problems. What big obvious problems do you think they are missing?
I agree with this, just want to mention that it seems better than a common alternative that I see: using LessWrong-sounding-ness/reputation as a proxy for forecasting ability
I somewhat agree with you, but I think that many people model it a bit like this: “I normally consume 100k/year, you give me 10k now so I will consume 110k this year, and if I lose the bet I will consume only 80k/year X years in the future”. But I agree that in practice the amounts are small and it doesn’t work for many reasons.
Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I’ll elaborate on all this in a top-level post.
Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.
Are you convinced the costs outweigh the benefits? It may be good for important instrumental reasons, e.g. reducing cognitive dissonance about sentience and moral weights, increasing the day-to-day salience of moral patients with limited agency or power (which could be an important share of those in the future), personal integrity or virtue, easing cooperation with animal advocates (including non-consequentialist ones), maybe health reasons.
Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren’t cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.
Or me. I’m veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.
I certainly haven’t done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I’d do the cost-benefit analysis, and do things like:
Eat meat that would otherwise go to waste (when that wouldn’t increase anticipated demand for meat in the future)
Try to reduce others’ meat consumption, and try to reduce the supply of meat or improve the lives of farmed animals, when that’s more cost-effective than personal veg(etari)anism
Notice whether eating meat would substantially boost my health and productivity, and go back to eating meat if so
I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don’t notice this.
Written quickly, maybe hard-to-parse and imprecise.
Strong upvoted and couldn’t decide whether to disagreevote or not. I agree with the points you list under meta-uncertainty and your point on naively using calibration as a proxy for forecasting ability + thinking you can bet on the end of the world by borrowing money. I disagree with your thoughts on ethics (I’m sympathetic to Zvi’s writing on EAs confusing the map for the territory).
What’s the best thing to read on “Zvi’s writing on EAs confusing the map for the territory”? Or at least something good?
I’m not sure what would be the best thing since I don’t remember there being a particular post about this. However, he talks about it in his book review for Going Infinite and I also like his post on Altruism is Incomplete. Lots of people I know find his writing confusing though and it’s not like he’s rigorously arguing for something. When I agree with Zvi, it’s usually because I have had that belief in the back of my mind for a while and him pointing it out makes it more salient, rather than because I got convinced by a particular argument he was making.
What problems are you thinking of in particular?
I don’t want to try to explain now, sorry.
(This shortform was intended more as starting-a-personal-list than as a manifesto.)
(Not totally sure what you mean here.) I think the portfolio items are justified on the basis of distinct worldviews, which differ in part based on their normative commitments (e.g. theories of welfare like hedonism or preference views, moral weights, axiology, decision theory, epistemic standards, non-consequentialist commitments) across which there is no uniquely justified universal common scale. People might be doing this pretty informally or deferring, though.
I think this can make sense if you have imprecise credences or normative uncertainty (for which there isn’t a uniquely justified universal common scale across views). Specifically, if you’re unable to decide whether action A does net good or net harm (in expectation), because it does good for cause X and harm for cause Y, and the two causes are too hard to compare, it might make sense to offset. Portfolios can be (more) robustly positive than the individual acts. EDIT: But maybe you find this too difference-making?
It takes like 20 hours of focused reading to get basic context on AI risk and threat models. Once you have that, I feel like you can read everything important in x-risk-focused AI policy in 100 hours. Same for x-risk-focused AI corporate governance, AI forecasting, and macrostrategy.
[Edit: read everything important doesn’t mean you have nothing left to learn; it means something like you have context to appreciate ~all papers, and you can follow ~all conversations in the field except between sub-specialists, and you have the generators of good overviews like 12 tentative ideas for US AI policy.]
Am I wrong?
Actually yes, I’m imagining going back and speedrunning learning; if you’re not an expert then you’re much worse at (1) figuring out what to prioritize reading and (2) skimming. But still, 300 hours each, or 200 with a good reading list, or 150 with a great reading list.
This is wild. Normal fields require more like 10,000 hours engagement before you reach the frontier, and much more to read everything important. Right?
Why aren’t more people at the frontier in these four areas?
Normal fields have textbooks and syllabi and lit reviews. Those are awesome for learning quickly. We should have better reading lists. I should make reading lists.
My opportunity cost is high for several weeks; I’ll plan to try this in December. I should be able to make fine 100-hour reading lists on these four topics in 1 day each, or good ones in a week each.
I will be tempted to read too much stuff I haven’t already read. (But I should skim anything I haven’t read in e.g. https://course.aisafetyfundamentals.com/governance.) And I will have the curse of knowledge regarding prerequisites/context and what’s-hard-to-understand. Shrug.
Maybe I can just get someone else to make great reading lists...
Why don’t there exist better reading lists / syllabi, especially beyond introductory stuff?
A reading list will be out of date in 6 months. Hmm. Maybe updating it wouldn’t actually be that hard?
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone? Would they be helpful if they got more attention? I almost never know who uses them. If I did know, talking to those people might be helpful.
If I actually try to make great/canonical AI governance reading lists, I should:
Check out all the existing reading lists: public ones + see private airable + student fellowships on governance (Harvard/MIT/Stanford) + reading lists given to new GovAI fellows or IAPS staff
Ask various people for advice + input + feedback: Mauricio, Michael, Matthijs, David, AISF folks, various slacks; plus experts on various particular topics like “takeoff speed”
Think about target audience. Actually talk to people in potential target audiences.
Maybe relevant: https://www.ai-alignment-flashcards.com/
I don’t know whether alignment is similar. I suspect alignment has a lack of reading lists too.
The lack of lists of (research) project ideas (not to mention research agendas) in AI safety is even worse than the lack of reading lists. Can I fix that?
Talk to Michael and David
Super out of date but see https://forum.effectivealtruism.org/posts/kvkv6779jk6edygug/some-ai-governance-research-ideas and what it links to
[Check out + talk to people who run] some of: ERA, CHERI, PIBBSS, AI safety student groups (Harvard/MIT/Stanford), AISF, SPAR, AI Safety Camp, Alignment Jam, AI Safety Hubs Labs, GovAI fellowship (see private docs “GovAI Fellowship—Research project ideas” and “GovAI Summer Fellowship Handbook”), MATS, Astra
Did AI Safety Ideas try and fail to solve this problem? Talk to Esben?
Look for other existing lists (public and private)
Ask various slacks for (lists of) project ideas?
Ask authors of lists on https://forum.effectivealtruism.org/posts/MsNpJBzv5YhdfNHc9/a-central-directory-for-open-research-questions for updated lists.
Ask various relevant researchers & orgs for (lists of) project ideas?
For most AI governance researchers, I don’t know what they’re working on. That’s really costly and feels like it should be cheap to fix. I’m aware of one attempt to fix this; it failed and I don’t understand why.
Related: Research debt.
I disagree-voted because I feel like I’ve done much more than 100-hours of reading on AI Policy (including finishing the AI Safety Fundamentals Governance course) and still have a strong sense there’s a lot I don’t know, and regularly come across new work that I find insightful. Very possibly I’m prioritising reading the wrong things (and would really value a reading list!) but thought I’d share my experience as a data point.
Here are some of the curricula that HAIST uses:
The technical intro fellowship curriculum. It’s structured as a 7-week reading group with ~1 hour of reading per week. It’s is based off of BlueDot’s AISF and the two curricula have co-evolved (we exchange ideas with BlueDot ~semesterly); a major difference is that the HAIST curriculum is significantly abridged.
The policy fellowship syllabus.
The HAIST website also has a resources tab with lists of technical and policy papers.
For what it’s worth, I found your “AI policy ideas: Reading list” and “Ideas for AI labs: Reading list” helpful,[1] and I’ve recommended the former to three or four people. My guess would be that these reading lists have been very helpful to a couple or a few people rather than quite helpful to lots of people, but I’d also guess that’s the right thing to be aiming for given the overall landscape.
I expect there’s no good reason for this, and that it’s simply because it’s nobody’s job to make such reading lists (as far as I’m aware), and the few(?) people who could make good intermediate-to-advanced level readings lists either haven’t thought to do so or are too busy doing object-level work?
Helpful in the sense of: I read or skimmed the readings in those lists that I hadn’t already seen, which was maybe half of them, and I think this was probably a better use of my time than the counterfactual.
+1 to the interest in these reading lists.
Because my job is very time-consuming, I haven’t spent much time trying to understand the state of the art in AI risk. If there was a ready-made reading list I could devote 2-3 hours per week to, such that it’d take me a few months to learn the basic context of AI risk, that’d be great.
An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.
A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.
I don’t know how we can pursue that goal.