ore Quick sketch of what I mean (and again I think others at Forethought may disagree with me):
I think most of the work that gets done at Forethought builds primarily on top of conceptual models that are at least in significant part ~deferring to a fairly narrow cluster of AI worldviews/paradigms (maybe roughly in the direction of what Joe Carlsmith/Buck/Ryan have written about)
(To be clear, I think this probably doesn’t cover everyone, and even when it does, there’s also work that does this more/less, and some explicit poking at these worldviews, etc.)
So in general I think I’d prefer the AI worldviews/deferral to be less ~correlated. On the deferral point — I’m a bit worried that the resulting aggregate models don’t hold together well sometimes, e.g. because it’s hard to realize you’re not actually on board with some further-in-the-background assumptions being made for the conceptual models you want to pull in.
And I’d also just like to see more work based on other worldviews, even just ones that complicate the paradigmatic scenarios / try to unpack the abstractions to see if some of the stuff that’s been simplified away is blinding us to important complications (or possibilities).[1]
I think people at Forethought often have some underlying ~welfarist frames & intuitions — and maybe more generally a tendency to model a bunch of things via something like utility functions — as opposed to thinking via frames more towards the virtue/rights/ethical-relationships/patterns/… direction (or in terms of e.g. complex systems)
Quick links illustrating some stuff on what I’m trying to gesture at might include this & this
We’re not doing e.g. formal empirical experiments, I think (although that could also be fine given this area of work, just listing as it pops into my head)
There’s some fuzzy pattern in the core ways in which most(?) people at Forethought seem to naturally think, or in how they prefer to have research discussions, IMO? I notice that the types of conversations (and collaborations) I have with some other people go fairly differently, and this leads me to different places.
To roughly gesture at this, in my experience Forethought tends broadly more towards a mode like “read/think → write a reasonably coherent doc → get comments from various people → write new docs / sometimes discuss in Slack or at whiteboards (but often with a particular topic in mind...)”, I think? (Vs things like trying to map more stuff out/think through really fuzzy thoughts in active collaboration, having some long topical-but-not-too-structured conversations that end up somewhere unexpected, “let’s try to make a bunch of predictions in a row to rapidly iterate on our models,” etc.)
(But this probably varies a decent amount between people, and might be context-dependent, etc.)
IIRC many people have more expertise on stuff like analytic philosophy, maybe something like econ-style modeling, and the EA flavor of generalist research than e.g. ML/physics/cognitive science/whatever, or maybe hands-on policy/industry/tech work, or e.g. history/culture… And there are various similarities in people’s cultural/social backgrounds. (I don’t actually remember everyone’s backgrounds, though, and think it’s easy to overindex on this / weigh it too heavily. But I’d be surprised if that doesn’t affect things somewhat.)
I also want to caveat that:
(i) I’m not trying to be exhaustive here, just listing what’s salient, and
(ii) in general it’ll be harder for me to name things that also strongly describe me (although I’m trying, to some degree), especially as I’m just quickly listing stuff and not thinking too hard.
I’ve been struggling to articulate this well, but I’ve recently been feeling like, for instance, proposals on making deals with “early [potential] schemers” implicitly(?) rely on a bunch of assumptions about the anatomy of AI entities we’d get at relevant stages.
More generally I’ve been feeling pretty iffy about using game-theoretic reasoning about “AIs” (as in “they’ll be incentivized to...” or similar) because I sort of expect it to fail in ways that are somewhat similar to what one gets if one tries to do this with states or large bureaucracies or something—iirc the fourth paper here discussed this kind of thing, although in general there’s a lot of content on this. Similar stuff on e.g. reasoning about the “goals” etc. of AI entities at different points in time without clarifying a bunch of background assumptions (related, iirc).
Thanks for writing the post and this comment, Lizka!
~deferring to a fairly narrow cluster of AI worldviews/paradigms (maybe roughly in the direction of what Joe Carlsmith/Buck/Ryan have written about)
I agree that most of Forethought (apart from you!) have views that are somewhat similar to Joe/Buck/Ryan’s, but I think that’s mostly not via deferral?
+1 to wanting people who can explore other perspectives, like Gradual Disempowerment, coalitional agency, AI personas, etc. And the stuff that you’ve been exploring!
I also agree that there’s some default more welfarist / consequentialist frame, though I think often we don’t actually endorse this on reflection. Also agree that there’s some shared thinking styles, though I think there’s a bit more diversity in training (we have people who majored in history, CS, have done empirical ML work, etc).
Also maybe general note, that on many of the axes you’re describing you are adding some of the diversity that you want, so Forethought-as-a-whole is a bit more diverse on these axes than Forethought-minus-Lizka.
I think I agree maybe ~80%. My main reservation (although quite possibly we agree here) is that if Forethought hired e.g. the ‘AI as a normal technology’ people, or anyone with equivalently different baseline assumptions and ways of thinking to most of Forethought, I think that would be pretty frustrating and unproductive. (That said, I think brining people like that in for a week or so might be great, to drill down into cruxes and download each others’ world models more.) I think there is something great about having lots of foundational things in common with the people you work closely with.
But I agree that having more people who share some basic prerequisites of thinking ASI is possible, likely to come this century, being somewhat longtermist and cosmopolitan and altruistic, etc, but disagree a lot on particular topics like AI timelines and threat models and research approaches and so forth can be pretty useful.
Yeah, I guess I don’t want to say that it’d be better if the team had people who are (already) strongly attached to various specific perspectives (like the “AI as a normal technology” worldview—maybe especially that one?[1]). And I agree that having shared foundations is useful / constantly relitigating foundational issues would be frustrating. I also really do think the points I listed under “who I think would be a good fit” — willingness to try on and ditch conceptual models, high openness without losing track of taste, & flexibility — matter, and probably clash somewhat with central examples of “person attached to a specific perspective.”
= rambly comment, written quickly, sorry! =
But in my opinion we should not just all (always) be going off of some central AI-safety-style worldviews. And I think that some of the divergence I would like to see more of could go pretty deep—e.g. possibly somewhere in the grey area between what you listed as “basic prerequisites” and “particular topics like AI timelines...”. (As one example, I think accepting terminology or the way people in this space normally talk about stuff like “alignment” or “an AI” might basically bake in a bunch of assumptions that I would like Forethought’s work to not always rely on.)
One way to get closer to that might be to just defer less or more carefully, maybe. And another is to have a team that includes people who better understand rarer-in-this-space perspectives, which diverge earlier on (or people who are by default inclined to thinking about this stuff in ways that are different from others’ defaults), as this could help us start noticing assumptions we didn’t even realize we were making, translate between frames, etc.
So maybe my view is that (1) there were more ~independent worldview formation/ exploration going on, and that (2) the (soft) deferral that is happening (because some deferral feels basically inevitable) were less overlapping.
(I expect we don’t really disagree, but still hope this helps to clarify things. And also, people at Forethought might still disagree with me.)
If this perspective involves a strong belief that AI will not change the world much, then IMO that’s just one of the (few?) things that are ~fully out of scope for Forethought. I.e. my guess is that projects with that as a foundational assumption wouldn’t really make much sense to do here. (Although IMO even if, say, I believed that this conclusion was likely right, I might nevertheless be a good fit for Forethought if I were willing to view my work as a bet on the worlds in which AI is transformative.)
But I don’t really remember what the “AI as normal..” position is, and could imagine that it’s somewhat different — e.g. more in the direction of “automation is the wrong frame for understanding the most likely scenarios” / something like this. In that case my take would be that someone exploring this at Forethought could make sense (haven’t thought about this one much), and generally being willing to consider this perspective at least seems good, but I’d still be less excited about people who’d come with the explicit goal of pursuing that worldview & no intention of updating or whatever.
--
(Obviously if the “AI will not be a big deal” view is correct, I’d want us to be able to come to that conclusion—and change Forethught’s mission or something. So I wouldn’t e.g. avoid interacting with this view or its proponents, and agree that e.g. inviting people with this POV as visitors could be great.)
If this perspective involves a strong belief that AI will not change the world much, then IMO that’s just one of the (few?) things that are ~fully out of scope for Forethought
I disagree with this. There would need to be some other reason for why they should work at Forethought rather than elsewhere, but there are plausible answers to that — e.g. they work on space governance, or they want to write up why they think AI won’t change the world much and engage with the counterarguments.
On the “AI as normal technology” perspective—I don’t think it involves a strong belief that AI won’t change the world much. The authors restate their thesis in a later post:
There is a long causal chain between AI capability increases and societal impact. Benefits and risks are realized when AI is deployed, not when it is developed. This gives us (individuals, organizations, institutions, policymakers) many points of leverage for shaping those impacts. So we don’t have to fret as much about the speed of capability development; our efforts should focus more on the deployment stage both from the perspective of realizing AI’s benefits and responding to risks. All this is not just true of today’s AI, but even in the face of hypothetical developments such as self-improvement in AI capabilities. Many of the limits to the power of AI systems are (and should be) external to those systems, so that they cannot be overcome simply by having AI go off and improve its own technical design.
The idea of focusing more on the deployment stage seems pretty consistent with Will MacAskill’s latest forum post about making the transition to a post-AGI society go well. There are other aspects of the “AI as normal technology” worldview that I expect will conflict more with Forethought’s, but I’m not sure that conflict would necessarily be frustrating and unproductive—as you say, it might depend on the person’s characteristics like openness and willingness to update, etc.
Nice, yes I think we roughly agree! (Though maybe you are nobler than me in terms of finding a broader range of views provocatively plausible and productive to engage with.)
I can’t speak to the “AI as a normal technology” people in particular, but a shortlist I created of people I’d be very excited about includes someone who just doesn’t buy at all that AI will drive an intelligence explosion or explosive growth.
I think there are lots of types of people where it wouldn’t be a great fit, though. E.g. continental philosophers; at least some of the “sociotechnical” AI folks; more mainstream academics who are focused on academic publishing. And if you’re just focused on AI alignment, probably you’ll get more at a different org than you would at Forethought.
More generally, I’m particularly keen on situations where V(X, Forethought team) is much greater than than V(X) + V(Forethought team), either because there are synergies between X and the team, or because X is currently unable to do the most valuable work they could in any of the other jobs they could be in.
ore Quick sketch of what I mean (and again I think others at Forethought may disagree with me):
I think most of the work that gets done at Forethought builds primarily on top of conceptual models that are at least in significant part ~deferring to a fairly narrow cluster of AI worldviews/paradigms (maybe roughly in the direction of what Joe Carlsmith/Buck/Ryan have written about)
(To be clear, I think this probably doesn’t cover everyone, and even when it does, there’s also work that does this more/less, and some explicit poking at these worldviews, etc.)
So in general I think I’d prefer the AI worldviews/deferral to be less ~correlated. On the deferral point — I’m a bit worried that the resulting aggregate models don’t hold together well sometimes, e.g. because it’s hard to realize you’re not actually on board with some further-in-the-background assumptions being made for the conceptual models you want to pull in.
And I’d also just like to see more work based on other worldviews, even just ones that complicate the paradigmatic scenarios / try to unpack the abstractions to see if some of the stuff that’s been simplified away is blinding us to important complications (or possibilities).[1]
I think people at Forethought often have some underlying ~welfarist frames & intuitions — and maybe more generally a tendency to model a bunch of things via something like utility functions — as opposed to thinking via frames more towards the virtue/rights/ethical-relationships/patterns/… direction (or in terms of e.g. complex systems)
Quick links illustrating some stuff on what I’m trying to gesture at might include this & this
We’re not doing e.g. formal empirical experiments, I think (although that could also be fine given this area of work, just listing as it pops into my head)
There’s some fuzzy pattern in the core ways in which most(?) people at Forethought seem to naturally think, or in how they prefer to have research discussions, IMO? I notice that the types of conversations (and collaborations) I have with some other people go fairly differently, and this leads me to different places.
To roughly gesture at this, in my experience Forethought tends broadly more towards a mode like “read/think → write a reasonably coherent doc → get comments from various people → write new docs / sometimes discuss in Slack or at whiteboards (but often with a particular topic in mind...)”, I think? (Vs things like trying to map more stuff out/think through really fuzzy thoughts in active collaboration, having some long topical-but-not-too-structured conversations that end up somewhere unexpected, “let’s try to make a bunch of predictions in a row to rapidly iterate on our models,” etc.)
(But this probably varies a decent amount between people, and might be context-dependent, etc.)
IIRC many people have more expertise on stuff like analytic philosophy, maybe something like econ-style modeling, and the EA flavor of generalist research than e.g. ML/physics/cognitive science/whatever, or maybe hands-on policy/industry/tech work, or e.g. history/culture… And there are various similarities in people’s cultural/social backgrounds. (I don’t actually remember everyone’s backgrounds, though, and think it’s easy to overindex on this / weigh it too heavily. But I’d be surprised if that doesn’t affect things somewhat.)
I also want to caveat that:
(i) I’m not trying to be exhaustive here, just listing what’s salient, and
(ii) in general it’ll be harder for me to name things that also strongly describe me (although I’m trying, to some degree), especially as I’m just quickly listing stuff and not thinking too hard.
(And thanks for the nice meta note!)
I’ve been struggling to articulate this well, but I’ve recently been feeling like, for instance, proposals on making deals with “early [potential] schemers” implicitly(?) rely on a bunch of assumptions about the anatomy of AI entities we’d get at relevant stages.
More generally I’ve been feeling pretty iffy about using game-theoretic reasoning about “AIs” (as in “they’ll be incentivized to...” or similar) because I sort of expect it to fail in ways that are somewhat similar to what one gets if one tries to do this with states or large bureaucracies or something—iirc the fourth paper here discussed this kind of thing, although in general there’s a lot of content on this. Similar stuff on e.g. reasoning about the “goals” etc. of AI entities at different points in time without clarifying a bunch of background assumptions (related, iirc).
Thanks for writing the post and this comment, Lizka!
I agree that most of Forethought (apart from you!) have views that are somewhat similar to Joe/Buck/Ryan’s, but I think that’s mostly not via deferral?
+1 to wanting people who can explore other perspectives, like Gradual Disempowerment, coalitional agency, AI personas, etc. And the stuff that you’ve been exploring!
I also agree that there’s some default more welfarist / consequentialist frame, though I think often we don’t actually endorse this on reflection. Also agree that there’s some shared thinking styles, though I think there’s a bit more diversity in training (we have people who majored in history, CS, have done empirical ML work, etc).
Also maybe general note, that on many of the axes you’re describing you are adding some of the diversity that you want, so Forethought-as-a-whole is a bit more diverse on these axes than Forethought-minus-Lizka.
I think I agree maybe ~80%. My main reservation (although quite possibly we agree here) is that if Forethought hired e.g. the ‘AI as a normal technology’ people, or anyone with equivalently different baseline assumptions and ways of thinking to most of Forethought, I think that would be pretty frustrating and unproductive. (That said, I think brining people like that in for a week or so might be great, to drill down into cruxes and download each others’ world models more.) I think there is something great about having lots of foundational things in common with the people you work closely with.
But I agree that having more people who share some basic prerequisites of thinking ASI is possible, likely to come this century, being somewhat longtermist and cosmopolitan and altruistic, etc, but disagree a lot on particular topics like AI timelines and threat models and research approaches and so forth can be pretty useful.
Yeah, I guess I don’t want to say that it’d be better if the team had people who are (already) strongly attached to various specific perspectives (like the “AI as a normal technology” worldview—maybe especially that one?[1]). And I agree that having shared foundations is useful / constantly relitigating foundational issues would be frustrating. I also really do think the points I listed under “who I think would be a good fit” — willingness to try on and ditch conceptual models, high openness without losing track of taste, & flexibility — matter, and probably clash somewhat with central examples of “person attached to a specific perspective.”
= rambly comment, written quickly, sorry! =
But in my opinion we should not just all (always) be going off of some central AI-safety-style worldviews. And I think that some of the divergence I would like to see more of could go pretty deep—e.g. possibly somewhere in the grey area between what you listed as “basic prerequisites” and “particular topics like AI timelines...”. (As one example, I think accepting terminology or the way people in this space normally talk about stuff like “alignment” or “an AI” might basically bake in a bunch of assumptions that I would like Forethought’s work to not always rely on.)
One way to get closer to that might be to just defer less or more carefully, maybe. And another is to have a team that includes people who better understand rarer-in-this-space perspectives, which diverge earlier on (or people who are by default inclined to thinking about this stuff in ways that are different from others’ defaults), as this could help us start noticing assumptions we didn’t even realize we were making, translate between frames, etc.
So maybe my view is that (1) there were more ~independent worldview formation/ exploration going on, and that (2) the (soft) deferral that is happening (because some deferral feels basically inevitable) were less overlapping.
(I expect we don’t really disagree, but still hope this helps to clarify things. And also, people at Forethought might still disagree with me.)
In particular:
If this perspective involves a strong belief that AI will not change the world much, then IMO that’s just one of the (few?) things that are ~fully out of scope for Forethought. I.e. my guess is that projects with that as a foundational assumption wouldn’t really make much sense to do here. (Although IMO even if, say, I believed that this conclusion was likely right, I might nevertheless be a good fit for Forethought if I were willing to view my work as a bet on the worlds in which AI is transformative.)
But I don’t really remember what the “AI as normal..” position is, and could imagine that it’s somewhat different — e.g. more in the direction of “automation is the wrong frame for understanding the most likely scenarios” / something like this. In that case my take would be that someone exploring this at Forethought could make sense (haven’t thought about this one much), and generally being willing to consider this perspective at least seems good, but I’d still be less excited about people who’d come with the explicit goal of pursuing that worldview & no intention of updating or whatever.
--
(Obviously if the “AI will not be a big deal” view is correct, I’d want us to be able to come to that conclusion—and change Forethught’s mission or something. So I wouldn’t e.g. avoid interacting with this view or its proponents, and agree that e.g. inviting people with this POV as visitors could be great.)
I disagree with this. There would need to be some other reason for why they should work at Forethought rather than elsewhere, but there are plausible answers to that — e.g. they work on space governance, or they want to write up why they think AI won’t change the world much and engage with the counterarguments.
On the “AI as normal technology” perspective—I don’t think it involves a strong belief that AI won’t change the world much. The authors restate their thesis in a later post:
The idea of focusing more on the deployment stage seems pretty consistent with Will MacAskill’s latest forum post about making the transition to a post-AGI society go well. There are other aspects of the “AI as normal technology” worldview that I expect will conflict more with Forethought’s, but I’m not sure that conflict would necessarily be frustrating and unproductive—as you say, it might depend on the person’s characteristics like openness and willingness to update, etc.
Nice, yes I think we roughly agree! (Though maybe you are nobler than me in terms of finding a broader range of views provocatively plausible and productive to engage with.)
I can’t speak to the “AI as a normal technology” people in particular, but a shortlist I created of people I’d be very excited about includes someone who just doesn’t buy at all that AI will drive an intelligence explosion or explosive growth.
I think there are lots of types of people where it wouldn’t be a great fit, though. E.g. continental philosophers; at least some of the “sociotechnical” AI folks; more mainstream academics who are focused on academic publishing. And if you’re just focused on AI alignment, probably you’ll get more at a different org than you would at Forethought.
More generally, I’m particularly keen on situations where V(X, Forethought team) is much greater than than V(X) + V(Forethought team), either because there are synergies between X and the team, or because X is currently unable to do the most valuable work they could in any of the other jobs they could be in.