Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
But thinking in terms of probabilities over probabilities is sometimes useful, e.g. you have a probability distribution over possible worlds/models and those worlds/models are probabilistic
Unstable beliefs about stuff like AI timelines in the sense of I’d be pretty likely to say something pretty different if you asked tomorrow
Instability in the sense of being likely to change beliefs if you thought about it more is fine; fluctuating predictably (dutch-book-ably) is not
Axiologies besides ~utilitarianism
Possibly I’m actually noticing sloppy reasoning about how to go from axiology to decision procedure, possibly including just not taking axiology seriously
Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project
Cause prioritization that doesn’t take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
E.g. RP’s Cross-Cause Cost-Effectiveness Model doesn’t take the cosmic endowment seriously
Deciding in advance to boost a certain set of causes [what determines that set??], or a “portfolio approach” without justifying the portfolio-items
E.g. multiple CEA staff donate by choosing some cause areas and wanting to help in each of those areas
Related error: agent-relativity
Related error: considering difference from status quo rather than outcomes in a vacuum
Related error: risk-aversion in your personal contributions (much more egregious than risk-averse axiology)
Instead you should just argmax — find the marginal value of your resources in each cause (for your resources that can funge between causes), then use them in the best possible way
Intra-cause offsetting: if you do harm in area X [especially if it’s avoidable/unnecessary/intentional], you should fix your harm in that area, even if you could do more good in another area
Maybe very few of my friends actually believe this
Not noticing big obvious problems with impact certificates/markets
Naively using calibration as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money
Many examples, e.g. How to place a bet on the end of the world
I think most of us understand the objection you can do better by just borrowing money at market rates — I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn’t liquidity-constrained). The bet only makes sense if you spend all your money before you’d have to pay back.
[Maybe something deep about donations; not sure]
[Maybe something about compensating altruists or compensating for roles often filled by altruists; not sure]
[Maybe something about status; not sure]
Possibly I’m wrong about which attitudes are common.
For now I’m just starting a list, not trying to be legible, much less change minds. I know I haven’t explained my views.
Edit: I’m sharing controversial beliefs, without justification and with some framed provocatively. If one of these views makes you think worse of me to a nontrivial degree, please ask for elaboration; maybe there’s miscommunication or it’s more reasonable than it seems. Edit 2: there are so many comments; I may not respond to requests-for-elaboration but will at least notice them as a bid-for-elaboration-at-some-point.
(meta musing) The conjunction of the negations of a bunch of statements seems a bit doomed to get a lot of disagreement karma, sadly. Esp. if the statements being negated are “common beliefs” of people like the ones on this forum.
I agreed with some of these and disagreed with others, so I felt unable to agreevote. But I strongly appreciated the post overall so I strong-upvoted.
This is just straightforwardly correct statistics. For example, ask a true bayesian to estimate the outcome of flipping a coin of unknown bias, and they will construct a probability distribution of coin flip probabilites, and only reduce this to a single probability when forced to make a bet. But when not taking a bet, they should be doing updates on the distribution, not the final estimate. (I’m pretty sure this is in fact the only logical way to do a bayesian update for the problem).
And why are we stating probabilities anyway? The main reason seems to be to quantify and communicate our beliefs. But if my “25% probability ” comes from a different distribution to your “25% probability ”, we may appear to be in agreement when in fact our worldviews differ wildly. I think giving credence intervals over probabilities is strictly better than this.
Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.
When I do this, it’s because I’m unable or unwilling to assign a probability distribution over the probabilities, so it won’t reduce to simple (precise) probabilities. Actually, in general, I think precise probabilities are epistemically unjustified (e.g. Schoenfield, 2012, section 3), but I’m willing to use more or less precise probabilities depending on the circumstances.
I’m not sure if I’d claim to have such unstable beliefs myself, but if you’re trying to be very precise with very speculative, subjective and hard-to-specifically-defend probabilities, then I’d imagine they could be very unstable, and influenced by things like your mood, e.g. optimism and pessimism bias. That is, unless you commit to your credences even if you’d had formed different ones if you had started from scratch or you make arbitrary choices in forming them that could easily have gone differently. You might weigh the same evidence or arguments differently from one day to the next.
I’d guess most people would also have had at least slightly different credences on AI timelines if they had seen the same evidence or arguments in a different order, or were in a different mood when they were forming their credences or building models, or for many other different reasons. Some number or parameter choices will come down to intuition, and intuition can be unstable.
fluctuating predictably (dutch-book-ably) is not
I don’t think people are fluctuating predictably (dutch-book-ably). How exactly they’d change their minds or even the direction is not known to them ahead of time.
(But maybe you could Dutch book people by predicting their moods and so optimism and pessimism bias?)
Some people say things like “my doom-credence fluctuates between 10% and 25% day to day”; this is dutch-book-able and they’d make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.
This is dutch-book-able only if there is no bid-ask spread. A rational choice in this case would be to have a very wide bid-ask spread. E.g. when Holden Karnofsky writes that his P(doom) is between 10% and 90%, I assume he would bet for doom at 9% or less, bet against doom at 91% or more, and not bet for 0.11<p<0.89. This seems a very rational choice in a high-volatility situation where information changes extremely quickly. (As an example, IIRC the bid-ask spread in financial markets increases right before earnings are released).
(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don’t really want to get into this now.)
Hmm, okay. So, for example, when they’re below 15%, you bet that it will happen at odds matching 15% against them, and when they’re above 20%, you bet that it won’t happen at 20% against them. And just make sure to size the bets right so that if you lose one bet, your payoff is higher in the other, which you’d win. They “give up” the 15-20% range for free to you.
Still, maybe they just mean to report the historical range or volatility of their estimates? This would be like reporting the historical volatility of a stock. They may not intend to imply, say, that they’ll definitely fall below 15% at some point and above 20% at another.
Plus, picking one way to average may seem unjustifiably precise to them. The average over time is one way, but another is the average over relatively unique (clusters) of states of mind, e.g. splitting weight equally between good, ~neutral and bad moods, averages over possible sets of value assignments for various parameters. There are many different reasonable choices they can make, all pretty arbitrary.
Thank you for writing this. I share many of these, but I’m very uncertain about them.
Here it is:
I think this is rational, I think of probabilities in terms of bets and order books. I think this is close to my view, and the analogy of financial markets is not irrelevant.
Changing literally day-to-day seems extreme, but month-to-month seems very reasonable given the speed of everything that’s happening, and it matches e.g. the volatility of NVIDIA stock price.
To me, “utilitarianism” seems pretty general, as long as you can arbitrarily define utility and you can arbitrarily choose between Negative/Rule/Act/Two-level/Total/Average/Preference/Classical utilitarianism. I really liked this section of a recent talk by Toby Ord (Starting from “It starts by observing that the three main traditions in Western philosophy each emphasize a different focal point:”). (I also don’t know if axiology is the right word for what we want to express here, we might be talking past each other)
I mostly agree with you, but second order effects seem hard to evaluate and both costs and benefits are so minuscule (and potentially negative) that I find it hard to do a cost-benefit-analysis.
I agree with you, but for some it might be an instrumentally useful intentional framing. I think some use phrases like “[Personal flourishing] for its own sake, for the sake of existential risk.” (see also this comment for a fun thought experiment for average utilitarians, but I don’t think many believe it)
Some think the probability of extinction per century is only going up with humanity increasing capabilities, and are not convinced by arguments that we’ll soon reach close-to-speed-of-light travel which will make extinction risk go down. See also e.g. Why I am probably not a longtermist (except point 1). I find this very reasonable.
I agree, I think this makes a ton of sense for people in community building that need to work with many cause areas (e.g. CEA staff, Peter Singer), but I fear that it makes less sense for private individuals maximizing their impact.
I think many people notice big obvious problems with impact certificates/markets, but think that the current system is even worse, or that they are at least worth trying and improving, to see if at their best they can in some cases be better than the alternatives we have. The current funding systems also have big obvious problems. What big obvious problems do you think they are missing?
I agree with this, just want to mention that it seems better than a common alternative that I see: using LessWrong-sounding-ness/reputation as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money … I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn’t liquidity-constrained)
I somewhat agree with you, but I think that many people model it a bit like this: “I normally consume 100k/year, you give me 10k now so I will consume 110k this year, and if I lose the bet I will consume only 80k/year X years in the future”. But I agree that in practice the amounts are small and it doesn’t work for many reasons.
Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I’ll elaborate on all this in a top-level post.
Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.
veg(etari)anism as ethical rather than as a costly indulgence
Are you convinced the costs outweigh the benefits? It may be good for important instrumental reasons, e.g. reducing cognitive dissonance about sentience and moral weights, increasing the day-to-day salience of moral patients with limited agency or power (which could be an important share of those in the future), personal integrity or virtue, easing cooperation with animal advocates (including non-consequentialist ones), maybe health reasons.
Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren’t cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.
Or me. I’m veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.
I certainly haven’t done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I’d do the cost-benefit analysis, and do things like:
Eat meat that would otherwise go to waste (when that wouldn’t increase anticipated demand for meat in the future)
Try to reduce others’ meat consumption, and try to reduce the supply of meat or improve the lives of farmed animals, when that’s more cost-effective than personal veg(etari)anism
Notice whether eating meat would substantially boost my health and productivity, and go back to eating meat if so
I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don’t notice this.
Written quickly, maybe hard-to-parse and imprecise.
Strong upvoted and couldn’t decide whether to disagreevote or not. I agree with the points you list under meta-uncertainty and your point on naively using calibration as a proxy for forecasting ability + thinking you can bet on the end of the world by borrowing money. I disagree with your thoughts on ethics (I’m sympathetic to Zvi’s writing on EAs confusing the map for the territory).
What’s the best thing to read on “Zvi’s writing on EAs confusing the map for the territory”? Or at least something good?
I’m not sure what would be the best thing since I don’t remember there being a particular post about this. However, he talks about it in his book review for Going Infinite and I also like his post on Altruism is Incomplete. Lots of people I know find his writing confusing though and it’s not like he’s rigorously arguing for something. When I agree with Zvi, it’s usually because I have had that belief in the back of my mind for a while and him pointing it out makes it more salient, rather than because I got convinced by a particular argument he was making.
Not noticing big obvious problems with impact certificates/markets
What problems are you thinking of in particular?
I don’t want to try to explain now, sorry.
(This shortform was intended more as starting-a-personal-list than as a manifesto.)
(Not totally sure what you mean here.) I think the portfolio items are justified on the basis of distinct worldviews, which differ in part based on their normative commitments (e.g. theories of welfare like hedonism or preference views, moral weights, axiology, decision theory, epistemic standards, non-consequentialist commitments) across which there is no uniquely justified universal common scale. People might be doing this pretty informally or deferring, though.
Intra-cause offsetting: if you do harm in area X, you should fix your harm in that area, even if you could do more good in another area
I think this can make sense if you have imprecise credences or normative uncertainty (for which there isn’t a uniquely justified universal common scale across views). Specifically, if you’re unable to decide whether action A does net good or net harm (in expectation), because it does good for cause X and harm for cause Y, and the two causes are too hard to compare, it might make sense to offset. Portfolios can be (more) robustly positive than the individual acts. EDIT: But maybe you find this too difference-making?
It takes like 20 hours of focused reading to get basic context on AI risk and threat models. Once you have that, I feel like you can read everything important in x-risk-focused AI policy in 100 hours. Same for x-risk-focused AI corporate governance, AI forecasting, and macrostrategy.
[Edit: read everything important doesn’t mean you have nothing left to learn; it means something like you have context to appreciate ~all papers, and you can follow ~all conversations in the field except between sub-specialists, and you have the generators of good overviews like 12 tentative ideas for US AI policy.]
Am I wrong?
Actually yes, I’m imagining going back and speedrunning learning; if you’re not an expert then you’re much worse at (1) figuring out what to prioritize reading and (2) skimming. But still, 300 hours each, or 200 with a good reading list, or 150 with a great reading list.
This is wild. Normal fields require more like 10,000 hours engagement before you reach the frontier, and much more to read everything important. Right?
Why aren’t more people at the frontier in these four areas?
Normal fields have textbooks and syllabi and lit reviews. Those are awesome for learning quickly. We should have better reading lists. I should make reading lists.
My opportunity cost is high for several weeks; I’ll plan to try this in December. I should be able to make fine 100-hour reading lists on these four topics in 1 day each, or good ones in a week each.
I will be tempted to read too much stuff I haven’t already read. (But I should skim anything I haven’t read in e.g. https://course.aisafetyfundamentals.com/governance.) And I will have the curse of knowledge regarding prerequisites/context and what’s-hard-to-understand. Shrug.
Maybe I can just get someone else to make great reading lists...
Why don’t there exist better reading lists / syllabi, especially beyond introductory stuff?
A reading list will be out of date in 6 months. Hmm. Maybe updating it wouldn’t actually be that hard?
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone? Would they be helpful if they got more attention? I almost never know who uses them. If I did know, talking to those people might be helpful.
If I actually try to make great/canonical AI governance reading lists, I should:
Check out all the existing reading lists: public ones + see private airable + student fellowships on governance (Harvard/MIT/Stanford) + reading lists given to new GovAI fellows or IAPS staff
Ask various people for advice + input + feedback: Mauricio, Michael, Matthijs, David, AISF folks, various slacks; plus experts on various particular topics like “takeoff speed”
Think about target audience. Actually talk to people in potential target audiences.
Maybe relevant: https://www.ai-alignment-flashcards.com/
I don’t know whether alignment is similar. I suspect alignment has a lack of reading lists too.
The lack of lists of (research) project ideas (not to mention research agendas) in AI safety is even worse than the lack of reading lists. Can I fix that?
Talk to Michael and David
Super out of date but see https://forum.effectivealtruism.org/posts/kvkv6779jk6edygug/some-ai-governance-research-ideas and what it links to
[Check out + talk to people who run] some of: ERA, CHERI, PIBBSS, AI safety student groups (Harvard/MIT/Stanford), AISF, SPAR, AI Safety Camp, Alignment Jam, AI Safety Hubs Labs, GovAI fellowship (see private docs “GovAI Fellowship—Research project ideas” and “GovAI Summer Fellowship Handbook”), MATS, Astra
Did AI Safety Ideas try and fail to solve this problem? Talk to Esben?
Look for other existing lists (public and private)
Ask various slacks for (lists of) project ideas?
Ask authors of lists on https://forum.effectivealtruism.org/posts/MsNpJBzv5YhdfNHc9/a-central-directory-for-open-research-questions for updated lists.
Ask various relevant researchers & orgs for (lists of) project ideas?
For most AI governance researchers, I don’t know what they’re working on. That’s really costly and feels like it should be cheap to fix. I’m aware of one attempt to fix this; it failed and I don’t understand why.
Related: Research debt.
I disagree-voted because I feel like I’ve done much more than 100-hours of reading on AI Policy (including finishing the AI Safety Fundamentals Governance course) and still have a strong sense there’s a lot I don’t know, and regularly come across new work that I find insightful. Very possibly I’m prioritising reading the wrong things (and would really value a reading list!) but thought I’d share my experience as a data point.
Here are some of the curricula that HAIST uses:
The technical intro fellowship curriculum. It’s structured as a 7-week reading group with ~1 hour of reading per week. It’s is based off of BlueDot’s AISF and the two curricula have co-evolved (we exchange ideas with BlueDot ~semesterly); a major difference is that the HAIST curriculum is significantly abridged.
The policy fellowship syllabus.
The HAIST website also has a resources tab with lists of technical and policy papers.
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone?
For what it’s worth, I found your “AI policy ideas: Reading list” and “Ideas for AI labs: Reading list” helpful, and I’ve recommended the former to three or four people. My guess would be that these reading lists have been very helpful to a couple or a few people rather than quite helpful to lots of people, but I’d also guess that’s the right thing to be aiming for given the overall landscape.
I expect there’s no good reason for this, and that it’s simply because it’s nobody’s job to make such reading lists (as far as I’m aware), and the few(?) people who could make good intermediate-to-advanced level readings lists either haven’t thought to do so or are too busy doing object-level work?
Helpful in the sense of: I read or skimmed the readings in those lists that I hadn’t already seen, which was maybe half of them, and I think this was probably a better use of my time than the counterfactual.
+1 to the interest in these reading lists.
Because my job is very time-consuming, I haven’t spent much time trying to understand the state of the art in AI risk. If there was a ready-made reading list I could devote 2-3 hours per week to, such that it’d take me a few months to learn the basic context of AI risk, that’d be great.
An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.
A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to result in rules that stop all labs from ending the world.
I don’t know how we can pursue that goal.