Arepo

Karma: 5,132

Arepo Aug 5, 2025, 6:43 AM
8 points
3 ∶ 1
in reply to: Seth Herd’s comment on: Should we aim for flourishing over mere survival? The Better Futures series.
Here’s the problem: The more people think seriously about this question, the more pessimistic they are.
Citation needed on this point. I you’re underrepresenting the selection bias for a start—it’s extremely hard to know how many people have engaged with and rejected the doomer ideas since they have far less incentive to promote their views. And those who do often find sloppy argument and gross misuses of the data in some of the prominent doomer arguments. (I didn’t have to look too deeply to realise the orthogonality thesis was a substantial source of groupthink)
Even within AI safety workers, it’s far from clear to me that the relationship you assert exists. My impression of the AI safety space is that there are many orgs working on practical problems that they take very seriously without putting much credence in the human-extinction scenarios (FAR.AI, Epoch, UK AISI off the top of my head).
One guy also looked at the explicit views of AI experts and found if anything an anticorrelation between their academic success and their extinction-related concern. That was looking back over a few years and obviously a lot can change in that time, but the arguments for AI extinction had already been around for well over a decade at the time of that survey.
The “expert forecasters” you cite don’t have nearly the time-on-task of thinking about the AGI alignment problem.
This is true for forecasting in every domain. There are virtually always domain experts who have spent their careers thinking about any given question, and yet superforecasters seem to systematically outperform them. If this weren’t true, superforecasting wouldn’t be a field—we’d just go straight to the domain experts for our predictions.

Arepo Aug 5, 2025, 6:21 AM
2 points
0 ∶ 0
on: Should we aim for flourishing over mere survival? The Better Futures series.
I agree with the general thrust of this post—at least the weak version that we should consider this to be an unexplored field, worth putting some effort into. But I strong disagree with the sentiment here:
maybe we’re closer to the “ceiling” on Survival than we are to the “ceiling” of Flourishing.
Most people (though not everyone) thinks we’re much more likely than not to Survive this century. Metaculus puts *extinction* risk at about 4%; a survey of superforecasters put it at 1%. Toby Ord put total existential risk this century at 16%.
Not that I disagree with their estimates (if anything I’d guess they’re too high), but because ‘extinction this century’ is a tiny fraction of the amount of time in which we could go extinct before going interstellar. There’s any number of Very Bad Things that could happen to us this century or soon thereafter which could substantially reduce our probability of surviving long term (in particular, of getting to a state where we can survive the expansion of our sun).

Arepo Jul 23, 2025, 4:09 PM
2 points
0 ∶ 0
in reply to: JuliaHP’s comment on: Arepo’s Quick takes
Yeah, it’s a relatively objective expression of the ratio of how much the most successful people in society care about tiny benefits to themselves vs any alternative.

Arepo Jul 23, 2025, 10:02 AM
4 points
0 ∶ 0
on: Arepo’s Quick takes
I wonder if it would be interesting to use PPP-adjusted sale price of high-end luxury or Veblen goods as a metric for moral progress of humanity (I suspect on this metric, we’d look like we were getting progressively worse, but I’m unclear if that’s a bug or a feature).

Arepo Jul 17, 2025, 6:11 AM
4 points
2 ∶ 0
on: The relative range heuristic
I think it’s a useful criterion and have upvoted it, though I have a couple of criticisms. The main one is that I’m concerned that EA has a heuristics addiction (ITN, extinction risk treated as though it were the only concern in existential risk, the prioritisation of ivory league and other universities, orthogonality thesis as an argument for the likelihood of bad AI outcomes, formulaic approaches to events, generally relying on single research projects by a small team or individual to settle a difficult question), so while having useful tools is good in theory, in practice I worry that there’s little middle ground between ‘ignored entirely’ and ‘sees widespread adoption that encourages more lazy thinking’. I’m not sure what to do about this
The other criticism is an object level application of this concern:
Larger animals might have more capacity for suffering, but intuitively they might have 10x more or 100x more at most… thus we should prioritize small animals.
I think there’s huge uncertainty in
- How much more capacity for suffering large animals have (‘infinitely more’ isn’t out of scope for some comparisons)
- Flow-through effects of focusing on larger vs smaller animals (e.g. eating larger animals is much worse for the climate, and hence for the medium- and possibly long-term future; though smaller animals are maybe a bigger biorisk threat)
- Human health (red meat ≈ large animal meat, and is probably worse for health than white meat)
So if we understand ‘should prioritise’ as ‘should plan to do substantial research and not treat this as a settled question until we have way more data on such concerns, but start with smaller animals and perhaps lean towards small-animal-favouring decisions in a personal capacity’ then this seems reasonable. But if this meant to be a serious justification for anything more substantial, then it seems like an example of overapplying the research efforts of a small team—whose work I like to be clear, but is nowhere near the last word on the subject.

Arepo Jul 9, 2025, 3:52 PM
2 points
0 ∶ 0
in reply to: Pivocajs’s comment on: Eating Honey is (Probably) Fine, Actually
I don’t think revealed preferences make philosophical sense in any context. If the enitity in question has an emotional reaction to its preference then that emotional reaction seems like an integral part of what matters. If it has no such emotional reaction then it seems presumptive to the point of being unparsable to say that it was revealing a preference for ‘not swarming’ vs, say ‘staying with an uncoordinated group that can therefore never spontaneously leave’ or still more abstract notions.

Arepo Jul 5, 2025, 4:37 AM
2 points
0 ∶ 0
in reply to: LanceSBush’s comment on: Debate: Morality is Objective
I don’t think I need to have better access to someone’s values to make a compelling case. For instance, suppose I’m running a store and someone breaks in with a gun and demands I empty the cash register. I don’t have to know what their values are better than they do to point out that they are on lots of security cameras, or that the police are on their way, and so on. It isn’t that hard to appeal to people’s values when convincing them. We do this all the time.
This is option 1: ‘Present me with real data that shows that on my current views, it would benefit me to vote for them’. Sometimes it’s available, but usually it isn’t.
Even if that foreclosed one mode of persuasion, well, too bad! That’s how reality is.
‘Too bad! That’s how reality is’ is analogous to the statement ‘too bad! That’s how morality is’ in its lack of foundation. ‘Reality’ and ‘truth’ are not available to us. What we have is a stream of valenced sensory input whose nature seems to depend somewhat on our behaviours. In general, we change our behaviour in such a way to to get better valenced sensory input, such as ‘not feeling cognitive dissonance’, ‘not being in extreme physical pain’, ‘getting the satisfaction of symbols lining up in an intuitive way’, ‘seeing our loved ones prosper’ etc.
At a ‘macroscopic’ level, this sensory input generally resolves into mental processes approximately like ‘”believing” that there are “facts”, about which our beliefs can be “right” or “wrong”’ and ‘it’s generally better to be right about stuff’, and ‘logicians, particle physicists and perhaps hedge fund managers are generally more right about stuff than religious zealots’. But this is all ultimately pragmatic. If cognitive dissonance didn’t feel good us, and if looking for consistency in the world didn’t seem to lead to nicer outcomes, we wouldn’t do it, and we wouldn’t care about rightness—and there’s no fundamental sense in which it would be correct or even meaningful to say that we were wrong.
I’m not sure this matters for the question of how reasonable we should think antirealism is—it might be a few levels less abstract than such concerns. But I don’t think it’s entirely obvious that it doesn’t, given the vagary to which I keep referring about what it would even mean for either moral pro-or-anti-realism to be correct. It might turn out that the least abstract principle we can judge it by is how we feel about its sensory consequences.
…carries the pragmatic implication that antirealists are more likely to be immoral people that threaten or manipulate others. Do you agree?
Eh, compared to who? I think most people are neither realists nor antirealists since they haven’t built the linguistic schema for either position to be expressable (I’m claiming that it’s not even possible to do so, but that’s neither here nor there). So antirealists are obviously heavily selected to be a certain type of nerd, which probably biases their population towards a generally nonviolent, relatively scrupulous and perhaps affable disposition.
But selection is different than cause, and I would guess among that nerdgroup, being utilitarian tends to cause one to be fractionally more likely to promote global utility, being contractualist tends to cause one to be fractionally more likely to uphold the social contract etc (I’m aware of the paper arguing that moral philosophers don’t seem to be particularly moral, but that was hardly robust science. And fwiw it vaguely suggests that older books—which bias heavily nonutilitarian—tempted more immorality).
The alternative is to believe that such people are all completely uninfluenced by their phenomenal experience of ‘belief’ in those philosophies, or that many of them are lying about having it (plausible, but leaves open the question of the effects of belief on behaviour of the ones who aren’t), or some othersuch surprising disjunction between their mental state and behaviour.
What links here?
- Arepo's comment on Debate: Morality is Objective by Bentham's Bulldog (Jul 5, 2025, 4:28 AM; 2 points)

Arepo Jul 5, 2025, 4:28 AM
2 points
0 ∶ 0
in reply to: LanceSBush’s comment on: Debate: Morality is Objective
I would not accept this characterization. Antirealism is the view that there are no stance-independent moral facts.
I don’t understand the difference, which is kind of the problem I identified in the first place. It’s difficult to reject the existence of a phenomenon you haven’t defined. (the concept of ignosticism applies here). ‘Moral facts’ sound to me something like ‘the truth values behind normative statements’ (though that has further definitional problems relating to both ‘truth values’ - cf my other most recent reply—and ‘normative statements’)
If you reject that definition, it might be more helpful to define moral facts by exclusion from seemingly better understood phenomena. For example, I think more practical definitions might be:
- Nonphysical phenomena
- Nonphysical and nonexperiential phenomena
Obviously this has the awkwardness of including some paranormal phenomena, but I don’t think that’s a huge cost. Many paranormal phenomena obviously would be physical, were they to exist (as in, they can exert force, have mass etc), and you and I can probably agree we’re not that interested in the particular nonexistences of most of the rest.
I have all kinds of preferences that are totally unrelated to my own experiences
I wrote a long essay about the parameters of ‘preference’ in the context of preference utilitarianism here, which I think equally applies to supposedly nonmoral uses of the word (IIRC I might have shown it to you before?). The potted version is that people frequently use the word in a very motte-and-bailey-esque fashion, sometimes invoking quasi magical properties of preferences, other times treating them as an unremarkable part of the physical or phenomenal world. I think that’s happening here:
> {cultural relativism … daughter} examples
There’s a relatively simple experientialist account of these, which goes ‘people pursue their daughter’s/culture’s wellbeing because it gives them some form of positive valence to do so’. This is the view which I accuse of being a conflict doctrine (unless it’s paired with some kind of principled pursuit of such positive valence elsewhere).
You seem to be saying your view is not this: ‘I’d do it because I value more than just my own experiences’.
If this is true, then I think many of my criticisms don’t apply to you—but I also think this is a very selective notion of antirealism. Specifically, it requires a notion of ‘to value’, which you’re saying is *not* exclusively experiential (and presumably isn’t otherwise entirely physical too—unless you say its nonexperiential components are just a revealed preference in your behaviour?).
Perhaps you just mean a more expansive notion of experiential value that the word ‘happiness’ implies. I use the latter to mean ‘any positively valenced experience’, fwiw—I don’t think the colloquial distinction is philosophically interesting. But that puts you back in the ‘doctrine of conflict’ camp, if you aren’t able to guide someone, through dispassioned argument, to value your daughter/culture the way you do if they don’t already.
For the record, I am not claiming that a large majority of persuasion falls into the 6th/7th groups. I think it’s a tiny minority of it in fact—substiantially less than the amount which is e.g. demonstrating how to think logically or understand statistics, or persuading someone to change their mind with logic or statistical data, both of which are already miniscule.
But the difference between antirealism and exclusivism/realism is that antirealism excludes the possibility of such interactions entirely.
When performing an action, my goal is to achieve the desired outcome. I don’t have to experience the outcome to be motivated to perform the action.
But you have no access to whether the outcome is achieved, only to your phenomenal experience of changing belief that it will/won’t be or has/hasn’t been. So if you don’t recognise the valence of that process of changing belief as the driver of your motivation and instead assert that some nonphysical link between your behaviour and the outcome is driving you, then under the exclusionary definition of moral facts you appear to be invoking one.
Can you elaborate on these?
- The uniquely non-evolutionarily explicability of utilitarianism (h.t. Joshua Greene’s argument in Moral Tribes) and how antirealists can explain this
- The convergence of moral philosophers towards three heavily overlapping moral philosophies—given the infinite possible moral philosophies and how antirealists can explain this
What is it antirealists are supposed to explain, specifically?
When we see a predictable pattern in the world, we generally understand it to be the result of some underlying law or laws, such that if you knew everything about the universe you could in principle predict the pattern before seeing it.
It seems basically impossible to explain the convergence towards the philosophies above by any law currently found in physical science. Evolutionary processes might drive people to protect their kin, deter aggressors etc, but there’s no need for any particular cognitive or emotional oattachment to the ‘rightness’ of this (there’s no obvious need for any emotional state at all, really, but even given that we have them they might have been entirely supervenient on behaviour, or universally tended towards cold pragmatism or whatever). And evolutionary process have no ability to explain a universally impartial philosophy like utiltiarianism, which is actively deleterious to its proponents’ survival and reproductive prospects.
So what are the underlying laws by which one could have predicted the convergence of moral philosophies, rather than just virtue signalling and similar behaviours, in particular to a set including utilitarianism?

Arepo Jul 1, 2025, 12:51 PM
3 points
0 ∶ 0
in reply to: Toby Tremlett🔹’s comment on: Debate: Morality is Objective
I would characterise antirealism as something like ‘believing that there is no normative space, and hence no logical or empirical line of reasoning you could give to change someone’s motivations.’
I don’t think anti-realists would accept that they aren’t possible on their view.
I would be interested to hear a counterexample that isn’t a language game. I don’t see how one can sincerely advocate someone else hold a position they think is logically indefensible.
Anti-realists don’t think that people don’t have dispositions that are well described as moral. It’s possible to share dispositions, and in fact we all empirically do share a lot of moral dispositions.
I think this is a language game. A ‘disposition’ is not the same phenomenon that someone who believes their morality has some logical/empirical basis thinks their morality is. A disposition isn’t functionally distinct from a preference—something we can arguably share, but per Hume, something which has nothing to do do with reason.
Someone who believed in a moral realist view that valued a state whose realisation they would never experience—black ties at their own funeral, for instance—should be highly sceptical of a moral antirealist who claimed to value the same state even though they also wouldn’t experience it. The realist believes the word ‘value’ in that sentence means something motivationally relevant to a moral realist. To an antirealist it can only mean something like ‘pleasing to imagine’. But if they won’t be at the funeral, they won’t know whether the state was realised, and so they can get their pleasure just imagining it happen—it doesn’t otherwise matter to them whether it does.
Not by coincidence I think, this arguably gives the antirealist access to a basically hedonistic quasi-morality in practice (though no recourse to defend it), but not to any common alternative.
I don’t see how persuading is easier for a moral realist, surely you would still need to appeal to something that your interlocutor already believes/ values.
If you start with the common belief that there is such some ‘objective’ morality and some set of steps of reasoning tools that would let us access it, you can potentially correct the other’s use of those tools in good faith. If one of you doesn’t actually believe that process is even possible, it would be disingenuous to suppose there’s something even to correct.
***
FWIW, we’re spilling a lot of ink over by far the least interesting part of my initial comment. I would expect it to be more productive to talk about e.g.:
- The analogy of (the irrelevance of) moral realism to (the irrelevance of) mathematical realism/Platonism
- The uniquely non-evolutionarily explicability of utilitarianism (h.t. Joshua Greene’s argument in Moral Tribes) and how antirealists can explain this
- The convergence of moral philosophers towards three heavily overlapping moral philosophies—given the infinite possible moral philosophies and how antirealists can explain this
- My suggestion that this strongly suggests a process by which some or most moral philosophies can be excluded: does this seem false? Or true, but insufficiently powered to narrow the picture down?
- My suggestion that iterative self-modification of one’s motivations might converge: whether people disagree with this suggestion or agree but think the phenomenon is explicable in e.g. strictly physical terms or otherwise uninteresting
- My suggestion that if we accept that motivation has its own set of axiom-like properties, we might be able to ‘derive’ quasi-moral views in the same way we can derive properties about applied maths or physics (i.e. not that they’re necessarily ‘true’ whatever that means, but that we will necessarily behave in ways that in some sense assume them to be)

Arepo Jul 1, 2025, 6:32 AM
2 points
0 ∶ 1
in reply to: LanceSBush’s comment on: Debate: Morality is Objective
Hey Lance,
To be clear, I’m talking about when an antirealist wants a behaviour change from another person (that, by definition, that person isn’t currently inclined to do). Say you wanted to persuade me to vote for a particular political candidate. If you were a moral realist, you’d have these classes of option:
1. Present me with real data that shows that on my current views, it would benefit me to vote for them
2. Present me with false or cherrypicked data that shows that on my current views, it would benefit me to vote for them
3. Threaten me if I don’t vote for them
4. Emotionally cajole me into voting for them, e.g. by telling me they saved my cat, that their opponent is a lecher, etc—in some way highlighting some trait that will irrationally dispose me towards them
5. Feign belief in moral view that I hold and show me that their policies are more aligned with it
6. Show me that their policies are more aligned with a moral view that we both in fact share
7. Persuade me to accept whatever (you think) is the ‘correct’ moral view, and show me that their policies are aligned with it
Perhaps others, and perhaps the 2-5 are basically the same thing, but whatever. As a moral antirealist you don’t have access to the last two. And without those, the only honest/nonviolent option you have to persuade me is not going to be available to you the majority of the time, since usually I’m going to be better informed than you about what things are in fact good for me.
This isn’t to say that moral antirealists necessarily will manipulate/threaten etc—I know many antirealists who seem like ‘good’ people who would find manipulating other people for personal gain grossly unpleasant. But nonetheless, taking away the last two options without replacing them with something equally honest necessarily incentivises the remaining set, most of which and the most accessible of which are dishonest.
This isn’t supposed to be a substantial argument for moral realism, but I think it’s an argument against antirealism. As an antirealist it would nonetheless be far better for you to live in a world where the 6th and 7th options were possible. So if you reject moral realism, you prudentially should nonetheless favour finding a third option, that permits similarly nonmanipulative options.
(Though, sidebar: while it’s easy to dismiss the desirability of this property as a distraction from the ‘truth’ of the debate, I think this is too simplistic. At the level of abstraction at which moral philosophy happens, ‘truth’ is also a somewhat murky notion, and one we don’t have access to. We can say we have beliefs, but even those are a form of action, and hence motivated. So it’s unclear to me what lies at the bottom of this pyramid, but I don’t think the view that morality/motivation is a form of knowledge and thus undergirded by epistemology makes any sense)

Arepo Jun 28, 2025, 3:41 AM
3 points
0 ∶ 1
on: Debate: Morality is Objective
(Deleted my lazy comment to give more colour)
Neither agree nor disagree—I think the question is malformed, and both ‘sides’ have extremely undesirable properties. Moral realism’s failings are well documented in the discussion here, and well parodied as being ‘spooky’ or just wishful thinking. But moral antirealism is ultimately a doctrine of conflict—if reason has no place in motivational discussion, then all that’s left for me to get my way from you is threats, emotional manipulation, misinformation and, if need be, actual violence. Any antirealist who denies this as the implication of their position is kidding themselves (or deliberately supplying misinformation).
So I advocate for a third position.
I think the central problem with this debate is that the word ‘objective’ here has no coherent referent (except when people use it for silly examples, like referring to instructions etched into the universe somewhere). And a noncoherent referent can neither be coherently asserted nor denied.
To paraphrase Douglas Adams, if we don’t know what the question is, we can’t hope to find an understandable answer.
I think it’s useful to compare moral philosophy to applied maths or physics, in that while there are still open debates about whether mathematical Platonism (approximately, objectivity in maths) is correct, most people think it isn’t (or, rather, that it’s incoherently defined) - and yet most people still think well-reasoned maths is essential to our interactions with the world. Perhaps the same could be true of morality.
One counterpoint might be that unlike maths, morality is dispensable—you can seemingly do pretty well in life by acting as though it doesn’t exist (arguably better). But I think this is true only if you focus exclusively on the limited domain of morality that deals with ‘spooky’ properties and incoherent referents.
A much more fruitful approach to the discussion, IMO, is to start by looking at the much broader question of motivation, aka the cause of Agent A taking some action A1. Motivation has various salient properties:
- Almost everyone agrees that it has a referent (eliminative materialists may disagree, but they’re a tiny minority—and perhaps are literally without mental state and therefore don’t have the information to understand the referent)
- But that referent is still mysterious—we don’t have a clear notion of either ‘causation’ or ‘agents’ - so there’s plenty of room for empirical and conceptual discussions of its nature
- Whatever it is is is everpresent in our decision-making process, possibly by definition
- Motivated agents can seemingly be changed by receiving information, in a more profound way than any other objects (excepting perhaps software, though the more software can be changed in comparable ways the more it starts to look like a motivated agent—and it’s still a long way from the breadth of interaction with input animals have)
- Crucially, many of us would change our motivation in somewhat predictable ways if we had the ability to rewrite our source code
For example, many of us might choose to modify our motivations so that we e.g.:
- Procrastinated less
- Put more effort into the aesthetics of our surroundings
- Were more patient in thinking things through (sort of a motivational aspect to intelligence)
- In general, helped our future selves more
- Felt generally happier
- Perhaps put more effort into helping out our friends and family
- Perhaps put more effort into helping strangers
- etc
I would argue that some—but not all—of these modifications would be close to or actually universal. I would also argue that some of those that weren’t universal for early self-modifications might still be states that iterated self-moderators would gravitate towards.
For example, becoming more ‘intelligent’ through patient thought might cause us to focus a) more on happiness itself than instrumental pathways to happiness like interior design, and b) to recognise the lack of a fundamental distinction between our ‘future self’ and ‘other people’, and so tend more towards willingness to help out the latter.
At this point I’m in danger of aligning hedonistic/valence utilitarianism to this process, but you don’t have to agree with the previous paragraph to accept that some motivations would be more universal, or at least greater ‘attractors’ than others while disagreeing on the particulars.
However it’s not a coincidence that thinking about ‘morality’ like this leads us towards some views more than others. Part of the appeal of this way of thinking is that it offers the prospect of ‘correct’ answers to moral philosophy, or at least shows that some are incorrect—in a comparable sense to the (in)correctness we find in maths.
So we can think of this process as revealing something analogous to ‘consistency’ in maths. It’s not (or not obviously) the same concept, since it’s hard to say there’s something formally ‘inconsistent’ in e.g. wanting to procrastinate more, or to be unhappier. Yet wanting such things is contrary in nature to something that for most or all of us resembles an ‘axiom’ - the drive to e.g. avoid extreme pain and generally to make our lives go better.
If we can identify this or these ‘motivational axiom(s)‘, or even just find a reasonable working definition of them, this means we are in a similar position as we are in applied maths: without ever showing that something is ‘objectively wrong’ - whatever that could mean—we can show that some conclusions are so contrary to our nature—our ‘nature’ being ‘the axioms we cannot avoid accepting as we function as conscious, decision-making, motivated beings’ that we can exclude them from serious consideration.
This raises the question of which and how many moral conclusions are left when we’ve excluded all those ruled out by our axioms. I suspect and hope that the answer is ‘one’ (you might guess approximately which from the rest of this message), but that’s a much more ambitious argument than I want to make here. Here I just want to claim that this is a better way of thinking about metaethical questions than the alternatives.
I’ve had to rush through this comment without clearly distinguishing theses, but I’m making 2.5 core claims here:
1. One can in principle imagine a way of ‘doing moral philosophy’ that excludes some set of conceivable moralities—and we needn’t use any imagined ‘object’ as a referent to do
2. That a promising way of doing so is to imagine what we might gravitate towards if we were to iteratively self-modify our motivations
3. That a distinct but related promising way of doing so is to recognise quasi- or actually-universal motivational ‘axioms’, and what they necessitate or rule out about our behaviour if consistently accounted for
I don’t know if these positions already exists in moral philosophy—I’d be very surprised if I’m the first to advocate them, but fwiw I didn’t find anything matching them when I looked a few years ago (though my search was hardly exhaustive). For want of distinguishing it from the undesirable properties of both traditional sets of views and with reference to the previous paragraph, I refer to it as ‘moral exclusivism’.
Obviously you could define exclusivism into being either antirealism or realism, but IMO that’s missing its ability to capture the intuition behind both without necessitating the baggage of either.

Arepo Jun 20, 2025, 8:15 AM
2 points
0 ∶ 0
in reply to: Thomas Kwa’s comment on: Arepo’s Quick takes
I think that’s right, but modern AI benchmarks seem to have much the same issue. A human with a modern Claude instance might be able to write code 100x faster than without, but probably less than 2x as fast at choosing a birthday present for a friend.
Ideally you want to integrate over… something to do with the set of all tasks. But it’s hard to say what that something would be, let alone how you’re going to meaningfully integrate it.

Arepo Jun 20, 2025, 3:39 AM
19 points
5 ∶ 5
in reply to: Ben_West🔸’s comment on: A deep critique of AI 2027’s bad timeline models
To make outcome-based decisions, you have to decide on the period in which you’re considering them. Considering any given period costs non-0 resources (reductio ad absurdum: in practice, considering all possible future timelines would cost infinite resources, so we presumably agree on the principle that excluding some from consideration is not only reasonable but necessary).
I think it’s a reasonable position to believe that if something can’t be empirically validated then it at least needs exceptionally strong conceptual justifications to inform such decisions.
This cuts both ways, so if the argument of AI2027 is ‘we shouldn’t dismiss this outcome out of hand’ then it’s a reasonable position (although I find Titotal’s longer backcasting an interesting counterweight, and it prompted me to wonder about a good way to backcast still further). If the argument is that AI safety researchers should meaningfully update towards shorter timelines based on the original essay or that we should move a high proportion of the global or altruistic economy towards event planning for AGI in 2027 - which seems to be what the authors are de facto pushing for—that seems much less defensible.
And I worry that they’ll be fodder for views like Aschenbrenner’s, and used to justify further undermining US-China relations and increasing the risk of great power conflict or nuclear war, both of which seems to me like more probable events in the next decade than AGI takeover.

Arepo Jun 19, 2025, 8:30 AM
6 points
1 ∶ 1
on: Arepo’s Quick takes
Suggested hiring practice tweak
There are typically two ways for organisations of running hiring rounds: deadlined, in which job applications are no longer processed after a publicised date, and rolling in which the organisation will keep allowing submissions until they’ve found someone they want.
The upside of a deadline is both to an applicant that they know they’re not wasting their time on a job that’s 99% assigned, and to the organisation, which doesn’t have to delay giving an answer to an adequate candidate on the grounds that a potentially better one submits when you’re most of the way through the hiring process, and incentivises people to apply slightly earlier than they would have.
The downsides are basically the complement. The individual doesn’t get to go for a job that they’ve just missed and would be really suited to, and the org doesn’t get to see as large a pool of applicants.
It occurred to me that an org might be able to get some of the best of both by explicitly giving a mostly-deadline, after which they will explicitly downweight new applications. So if you see the mostly-deadline in time, you’re still incentivised to get your application in by the date given, and if it’s passed you should rationally apply if and only if you think there’s a good chance you’re an exceptional fit..

Arepo Jun 19, 2025, 7:44 AM
6 points
0 ∶ 0
on: Arepo’s Quick takes
One of the problems with AI benchmarks is that they can’t effectively be backcast more than a couple of years. This prompted me to wonder if a more empirical benchmark might be something like ‘Ability of a human in conjunction with the best technology available at time t’.
For now at least, humans are still necessary to have in the loop, so this should in principle be at least as good as coding benchmarks for gauging where we are now. When/if humans become irrelevant, it should still work - ‘AI capability + basically nothing’ = ‘AI capability’. And looking back, it gives a much bigger reference class for forecasting future trends, allowing us to compare e.g.
- Human
- Human + paper & pen
- Human + log tables + paper & pen
- Human + calculator + log tables + paper & pen
- Human + computer with C + …
- Human + computer with Python + …
- Human + ML libraries + …
- Human + GPT 1 + …
etc.
Thoughts?

Arepo Jun 13, 2025, 4:30 AM
4 points
0 ∶ 0
in reply to: Manuel Del Río Rodríguez 🔹’s comment on: More Everything Forever—a new book critique of EA
Fwiw I commented on Thorstad’s linkpost for the paper when he first posted about it here. My impression is that he’s broadly sympathetic to my claim about multiplanetary resilience, but either doesn’t believe we’ll get that far or thinks that the AI counterconsideration dominates it.
In this light, I think that the claim that annual x-risk being lower than 1/(10^-9) being ‘implausible’ is much too strong if it’s being used to undermine EV reasoning. Like I said—if we become interstellar and no universe-ending doomsday technologies exist, then multiplicativity of risk gets you there pretty fast. If each planet has, say 1/(10^5) annual chance of extinction, then n planets have 1/(10^(5^n)) chance of all independently going extinct in a given year. For n=2 that’s already one in ten billion.
Obviously there’s a) a much higher chance that they could go extinct in different years and b) that they could go all extinct in any given period from non-independent events such as war. But even so, it’s hard to believe that increasing k, say to double digits, doesn’t rapidly outweigh such considerations, especially given that an advanced civilisation could probably create new self-sustaining settlements in a matter of years.
I feel it is highly speculative on the difficulties of making comebacks and on the likelihood of extreme climate change
I don’t understand how you think climate change is more speculative than AI risk. I think it’s reasonable to have higher credence in human extinction from the latter, but those scenarios are entirely speculative. Extreme climate change is possible if a couple of parameters turn out to have been mismeasured.
As for the probability of making comebacks, I’d like to write a post about this, but the narrative goes something like this:
- to ‘flourish’ (in an Ordian sense), we need to reach a state of sufficiently low x-risk
- per above, by far the mathematically plausible way of doing this is just increasing our number of self-sustaining settlements
  - you could theoretically do it with an exceptionally stable political/social system, but I’m with Thorstad that the level of political stability this requires seems implausible
- to reach that state, we have to develop advanced technologies—well beyond what we have now. So the question about ‘comebacks’ is misplaced—the question is about our prospect of getting from the beginning to (a good) end of at least one time of perils without a catastrophe
- Dating our current time of perils to 1945, it looks like we’re on course, barring global catastrophes, to develop a self-sustaining civilisation in maybe 120-200 years
- Suppose there’s a constant k probability annual risk of a catastrophe that regresses us to pre-time-of perils technology. Then our outlook in 1945 was, approximately, (1-k)^160 chance of getting to a multiplanetary state. Since we’ve made it 80 years in, we have a substantially better ~(1-k)^80.
- If we restart from pre-1945 levels of technology, we will do so with max 10% of the fossil fuel energy we had, as well as many other depleted resources (fertiliser, uranium, etc). This will slow any kind of reboot substantially. See e.g. comparisons of coal to photovoltaics here.
- There’s huge uncertainty here, but when you multiply out the friction from all the different depleted resources, I think progress the second time around will be optimistically ¹⁄₂ the speed, and pessimistically 1/10x or worse. (Based on above link, if photovoltaics were to entirely substitute fossil fuels, that drag alone would be around a ~30/5.5 multiplier on the cost of generating energy, which seems like it could easily slow economic development by a comparable amount)
- That means in a reboot, we have optimistically (1-k)^320 chance of getting to a good outcome, pessimistically (1-k)^1600
- During that reboot, we can expect the new civilisation to preferentially use up the most efficient resources just as we do (it doesn’t have to destroy them, just move them to much higher entropy states, such as our current practice of flushing fertiliser into the ocean) - but they have 2x, 10x or however much longer doing so.
- That means civilisation 3 would have as much a disadvantage over civilisation 2 as civilisation 2 would over us, giving it optimistically a (1-k)^640 chance of a good outcome, pessimistically a (1-k)^16000 chance.
If we plug in k=0.001, which seems to be a vaguely representative estimate among x-risk experts, then in 1945 we would have had an 85% chance, today we would have a 92% chance, after one backslide we would have optimistically 73%, pessimistically 20%, and after Backslide Two we would have optimistically 53%, pessimistically basically 0.
We can roughly convert these to units of ‘extinction’ by dividing the loss of probability by our current prospects. So going to probability 53%, would be losing 32% of our current prospects, which is 32%/85% as bad in the long term as extinction.
This is missing a lot of nuance, obviously, which I’ve written about in this sequence, so we certainly shouldn’t take these numbers very seriously. But I think they overall paint a pretty reasonable picture of a ‘minor’ catastrophe being, in long-run expectation and aside from any short-term suffering or change in human morality, perhaps in the range of 15-75% as bad as extinction. Lots of room for discussing particulars, but not something we should dismiss as extinction being ‘much worse’ than—and in particular, not sufficiently lower that we can in practice afford to ignore the relative probabilities of extinction vs lesser global catastrophe.

Arepo Jun 11, 2025, 2:03 PM
2 points
0 ∶ 0
on: More Everything Forever—a new book critique of EA
Thanks for the write-up. I’m broadly sympathetic to a lot of these criticisms tbh, despite not being very left-leaning. A couple of points you relate I think are importantly false:

(Thorstad’s claim that) there’s no empirical basis for believing existential risk will drop to near-zero after our current, uniquely dangerous period before achieving long-term stability.
I don’t know about ‘empirical’, but there’s a simple mathematical basis for imagining it dropping to near zero in a sufficiently advanced future where we have multiple self-sustaining and hermetically independent settlements e.g. (though not necessarily) on different planets. Then even if you assume disasters befalling one aren’t independent, you have to believe they’re extremely correlated for this not to net out to extremely high civilisational resilience as you get to double digit settlements. That level of correlation is possible if it turns out to be possible e.g. to trigger a false vacuum decay—in which case Thorstad is right—or if a hostile AGI could wipe out everything before it—though that probability will surely either be realised or drop close to 0 within a few centuries.
If you accept the concept of Existential Risk and give them any credence, it logically follows that any such risk is much worse than any other horrible, terrible, undesirable one that does not lead to human extinction.
It doesn’t, and I wish the EA movement would move away from this unestablished claim. Specifically, one must have some difference in credence between achieving whatever longterm future one desires given no ‘minor’ catastrophe and achieving it given at least one. That credence differential is, to a first approximation, the fraction representing how much of ‘1 extinction’ your minor catastrophe is. Assuming we’re reasonably ambitious in our long term goals (e.g., per above, developing a multiplanetary or interstellar civilisation), it seems crazy to me to suppose that fraction should be less than ¹⁄₁₀. I suspect it should be substantially higher, since on restart we would have to survive a high risk in time-of-perils-2 while proceeding to the safe end state much slower, given the depletion of fossil fuels and other key resources.
If we think a restart is >= 1/10x as bad as extinction then we have to ask serious questions about whether it’s >= 10x as likely. I think it’s at least defensible to claim that e.g. extreme climate change is 10x as likely as an AI destroying literally all humanity.

Arepo May 27, 2025, 2:13 AM
2 points
0 ∶ 0
in reply to: Eitan’s comment on: EA coworking/lounge space on gather.town
Replied on Gather :)

Arepo May 26, 2025, 3:02 AM
2 points
0 ∶ 0
in reply to: Eitan’s comment on: EA coworking/lounge space on gather.town
Hi Eitan, you’re very welcome to! No need to book or anything—you can just show up, find a suitable area and run the event if you want, as long as it’s comfortably under the space capacity (currently 60) :)
If you want map-editing privileges, or just for me to show you around and give a few pointers, feel free to DM me on here—or just log onto the Gather Town and send me a message if I’m around, which I usually am. If I’m not physically present at the time it’s still probably the fastest way to reach me.

Arepo May 17, 2025, 2:22 AM
2 points
0 ∶ 3
in reply to: Mjreard’s comment on: The Soul of EA is in Trouble
There are many ways to reduce existential risk. I don’t see any good reason to think that reducing small chances of extinction events is better EV than reducing higher chances of smaller catastrophes, or even just building human capacity in preferentially non-destructive way. The arguments that we should focus on extinction have always boiled down to ‘it’s simpler to think about’.