I’m not sure that Toby was wrong to work on this, but if he was, it’s because if he hadn’t, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.
How shortly? We’re discussing this in October 2025. What’s the newest piece of data that Toby’s analysis is dependent on? Maybe the Grok 4 chart from July 2025? Or possibly qualitative impressions from the GPT-5 launch in August 2025? Who else is doing high-quality analysis of this kind and publishing it, even using older data?
I guess I don’t automatically buy the idea that even in a few months we’ll see someone else independently go through the same reasoning steps as this post and independently come to the same conclusion. But there are plenty of people who could, in theory, do it and who are, in theory, motivated to do this kind of analysis and who also will probably not see this post (e.g. equity research analysts, journalists covering AI, AI researchers and engineers independent of LLM companies).
I certainly don’t buy the idea that if Toby hadn’t done this analysis, then someone else in effective altruism would have done it. I don’t see anybody else in effective altruism doing similar analysis. (I chalk that up largely to confirmation bias.)
I appreciate you raising this Wei (and Yarrow’s responses too). They both echoed a lot of my internal debate on this. I’m definitely not sure whether this is the best use of my time. At the moment, my research time is roughly evenly split between this thread of essays on AI scaling and more philosophical work connected to longtermism, existential risk and post-AGI governance. The former is much easier to demonstrate forward progress and there is more of a demand signal for it. The latter is harder to be sure it is on the right path and is in less demand. My suspicion is that it is generally more important though, and that demand/appreciation doesn’t track importance very well.
It is puzzling to me too that no-one else was doing this kind of work on understanding scaling. I think I must be adding some rare ingredient, but I can’t think of anything rare enough to really explain why no-one else got these results first. (People at the labs probably worked out a large fraction of this, but I still don’t understand why the people not at the labs didn’t.)
In addition to the general questions about which strand is more important, there are a few more considerations:
No-one can tell ex ante how a piece of work or research stream will pan out, so everyone will always be wrong ex post sometimes in their prioritisation decisions
My day job is at Oxford University’s AI Governance Initiative (a great place!) and I need to be producing some legible research that an appreciable number of other people are finding useful
I’m vastly more effective at work when I have an angle of attack and a drive to write up the results — recently this has been for these bite-size pieces of understanding AI scaling. The fact that there is a lot of response from others is helping with this as each piece receives some pushback that leads me to the next piece.
But I’ve often found your (Wei Dai’s) comments over the last 15-or-so years to be interesting, unusual, and insightful. So I’ll definitely take into account your expressed demand for more philosophical work and will look through those pages of philosophical questions you linked to.
Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it’s right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It’s been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I’ve gotten a few comments from philosophers saying they found the list helpful or that they’ll think about working on some of the problems, but I’m not aware of any concrete follow-ups.
If it is some kind of cultural or structural problem, it might be even higher leverage to work on solving that, instead of object level philosophical problems. I’d try to do this myself, but as an outsider to academic philosophy and also very far from any organizations who might potentially hire philosophers to work on AI alignment, it’s hard for me to even observe what the problem might be.
Re 99% of academic philosophers, they are doing their own thing and have not heard of these possibilities and wouldn’t be likely to move away from their existing areas if they had. Getting someone to change their life’s work is not easy and usually requires hours of engagement to have a chance. It is especially hard to change what people work on in a field when you are outside that field.
A different question is about the much smaller number of philosophers who engage with EA and/or AI safety (there are maybe 50 of these). Some of these are working on some of those topics you mention. e.g. Will MacAskill and Joe Carlsmith have worked on several of these. I think some have given up philosophy to work on other things such as AI alignment. I’ve done occasional bits of work related to a few of these (e.g. here on dealing with infinities arising in decision theory and ethics without discounting) and also to other key philosophical questions that aren’t on your list.
For such philosophers, I think it is a mixture of not having seen your list and not being convinced these are the best things that they each could be working on.
I was going to say something about lack of incentives, but I think it is also a lack of credible signals that the work is important, is deeply desired by others working in these fields, and would be used to inform deployments of AI. In my view, there isn’t much desire for work like this from people in the field and they probably wouldn’t use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.
In my view, there isn’t much desire for work like this from people in the field and they probably wouldn’t use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.
Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.
I think some have given up philosophy to work on other things such as AI alignment.
Any other names you can cite?
In my view, there isn’t much desire for work like this from people in the field and they probably wouldn’t use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince them to spend the time to take it seriously etc.
Thanks, this makes sense to me, and my follow-up is how concerning do you think this situation is?
One perspective I have is that at this point, several years into a potential AI takeoff, with AI companies now worth trillions in aggregate, alignment teams at AI companies still have virtually no professional philosophical oversight (or outside consultants that they rely on), and are kind of winging it based on their own philosophical beliefs/knowledge. It seems rather like trying to build a particle collider or fusion reaction with no physicists on the staff, only engineers.
(Or worse, unlike engineers’ physics knowledge, I doubt that receiving a systematic education in fields like ethics and metaethics is a hard requirement for working as an alignment researcher. And even worse, unlike the situation in physics, we don’t even have settled ethics/metaethics/metaphilosophy/etc. that alignment researchers can just learn and apply.)
Maybe the AI companies are reluctant to get professional philosophers involved, because in the fields that do have “professional philosophical oversight”, e.g., bioethics, things haven’t worked out that well. (E.g. human challenge trials being banned during COVID.) But to me, this would be a signal to yell loudly that our civilization is far from ready to attempt or undergo an AI transition, rather than a license to wing it based on one’s own philosophical beliefs/knowledge.
As an outsider, the situation seems cray alarming to me, and I’m confused that nobody else is talking about it, including philosophers like you who are in the same overall space and looking at roughly the same things. I wonder if you have a perspective that makes the situation not quite as alarming as it appears to me.
I’m a philosopher who’s switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.
I agree that many of these questions are important and that more people should work on them.
But a fair amount of them are discussed in conventional academic philosophy, e.g.:
How to resolve standard debates in decision theory?
Infinite/multiversal/astronomical ethics
Fair distribution of benefits
What is the nature of philosophy?
What constitutes correct philosophical reasoning?
How should an AI aggregate preferences between its users?
What is the nature of normativity?
And these are all difficult, controversial questions.
For each question, you have to read and deeply think about at least 10 papers (and likely many more) to get a good understanding of the question and its current array of candidate answers.
Any attempt to resolve the question would have to grapple with a large number of considerations and points that have previously been made in relation to the question.
Probably, you need to write something at least book-length.
(And it’s very hard to get people to read book-length things.)
In trying to do this, you probably don’t find any answer that you’re really confident in.
I think most philosophers’ view on the questions they study is: ‘It’s really hard. Here’s my best guess.’
Or if they’re confident of something, it’ll be a small point within existing debates (e.g. ‘This particular variant of this view is subject to this fatal objection.’).
And even if you do find an answer you’re confident in, you’ll have a very hard time convincing other philosophers of that answer.
They’ll bring up some point that you hadn’t thought of.
Or they’ll differ from you in their bedrock intuitions, and it’ll be hard for either of you to see any way to argue the other out of their bedrock intuition.
In some cases—like population ethics and decision theory—we have proofs that every possible answer will have some unsavory implication. You have to pick your poison, and different philosophers will make different picks.
And on inductive grounds, I suspect that many other philosophical questions also have no poison-free answers.
Derek Parfit is a good example here.
He spent decades working on On What Matters, trying to settle the questions of ethics and meta-ethics.
He really tried to get other philosophers to agree with him.
But very few do. The general consensus in philosophy is that it’s not a very convincing book.
And I think a large part of the problem is a difference of bedrock intuitions. For example, Bernard Williams simply ‘didn’t have the concept of a normative reason,’ and there was nothing that Parfit could do to change that.
It also seems like there’s not much of an appetite among AI researchers for this kind of work.
If there were, we might see more discussions of On What Matters, or any of the other existing works on these questions.
When I decided to start working on AI, I seriously considered working on the kinds of questions you list. But due to the points above, I chose to do my current work instead.
This reads to me like you’re saying “these problems are hard [so Wei Dai is over-rating the importance of working on them]”, whereas the inference I would make is “these problems are hard, so we need to slow down AI development, otherwise we won’t be able to solve them in time.”
I didn’t meant to imply that Wei Dai was overrating the problems’ importance. I agree they’re very important! I was making the case that they’re also very intractable.
If I thought solving these problems pre-TAI would be a big increase to the EV of the future, I’d take their difficulty to be a(nother) reason to slow down AI development. But I think I’m more optimistic than you and Wei Dai about waiting until we have smart AIs to help us on these problems.
I said a little in another thread. If we get aligned AI, I think it’ll likely be a corrigible assistant that doesn’t have its own philosophical views that it wants to act on. And then we can use these assistants to help us solve philosophical problems. I’m imagining in particular that these AIs could be very good at mapping logical space, tracing all the implications of various views, etc. So you could ask a question and receive a response like: ‘Here are the different views on this question. Here’s why they’re mutually exclusive and jointly exhaustive. Here are all the most serious objections to each view. Here are all the responses to those objections. Here are all the objections to those responses,’ and so on. That would be a huge boost to philosophical progress. Progress has been slow so far because human philosophers take entire lifetimes just to fill in one small part of this enormous map, and because humans make errors so later philosophers can’t even trust that small filled-in part, and because verification in philosophy isn’t much quicker than generation.
The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don’t know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There’s also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand the tree for us, there are potential advantages like you mentioned, but also potential disadvantages, like AIs not having good intuition/judgment about what lines of arguments to pursue, or the argument tree (or AI-generated philosophical literature) becoming too large for any humans to read and think about in a relevant time frame. Many will be very tempted to just let AIs answer the questions / make the final conclusions for us, especially if AIs also accelerate technological progress, creating many urgent philosophical problems related to how to use them safely and beneficially. Or if humans try to make the conclusions, can easily get them wrong despite AI help with expanding the argument tree.
So I think undergoing the AI transition without solving metaphilosophy, or making AIs autonomously competent at philosophy (good at getting correct conclusions by themselves) is enormously risky, even if we have corrigible AIs helping us.
I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don’t necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them “more legible” to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.
try to make them “more legible” to others, including AI researchers, key decision makers, and the public
Yes, I agree this is valuable, though I think it’s valuable mainly because it increases the probability that people use future AIs to solve these problems, rather than because it will make people slow down AI development or try very hard to solve them pre-TAI.
I’m not sure but I think maybe I also have a different view than you on what problems are going to be bottlenecks to AI development. e.g. I think there’s a big chance that the world would steam ahead even if we don’t solve any of the current (non-philosophical) problems in alignment (interpretability, shutdownability, reward hacking, etc.).
I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be “called to arms” by a civilization-wide AI safety effort, and would be given sufficient resources including time, so the list was trying to be more comprehensive (listing every philosophical problem that I thought relevant to AI safety) than prioritizing. Unfortunately, the reality is nearly the completely opposite of this.
Currently, one of my main puzzles is why philosophers with public AI x-risk estimates still have numbers in the 10% range, despite reality being near the most pessimistic of my range of expectations, and it looking like that the AI takeoff/transition will occur while most of these philosophical problems will remain in a wide open or totally confused state, and AI researchers seem almost completely oblivious or uncaring about this. Why are they not making the same kind of argument that I’ve been making, that philosophical difficulty is a reason that AI alignment/x-safety is harder than many think, and an additional reason to pause/stop AI?
I don’t think philosophical difficulty is that much of an increase to the difficulty of alignment, mainly because I think that AI developers should (and likely will) aim to make AIs corrigible assistants rather than agents with their own philosophical views that they try to impose on the world. And I think it’s fairly likely that we can use these assistants (if we succeed in getting them and aren’t disempowered by a misaligned AI instead) to help a lot with these hard philosophical questions.
How shortly? We’re discussing this in October 2025. What’s the newest piece of data that Toby’s analysis is dependent on? Maybe the Grok 4 chart from July 2025? Or possibly qualitative impressions from the GPT-5 launch in August 2025? Who else is doing high-quality analysis of this kind and publishing it, even using older data?
I guess I don’t automatically buy the idea that even in a few months we’ll see someone else independently go through the same reasoning steps as this post and independently come to the same conclusion. But there are plenty of people who could, in theory, do it and who are, in theory, motivated to do this kind of analysis and who also will probably not see this post (e.g. equity research analysts, journalists covering AI, AI researchers and engineers independent of LLM companies).
I certainly don’t buy the idea that if Toby hadn’t done this analysis, then someone else in effective altruism would have done it. I don’t see anybody else in effective altruism doing similar analysis. (I chalk that up largely to confirmation bias.)
I appreciate you raising this Wei (and Yarrow’s responses too). They both echoed a lot of my internal debate on this. I’m definitely not sure whether this is the best use of my time. At the moment, my research time is roughly evenly split between this thread of essays on AI scaling and more philosophical work connected to longtermism, existential risk and post-AGI governance. The former is much easier to demonstrate forward progress and there is more of a demand signal for it. The latter is harder to be sure it is on the right path and is in less demand. My suspicion is that it is generally more important though, and that demand/appreciation doesn’t track importance very well.
It is puzzling to me too that no-one else was doing this kind of work on understanding scaling. I think I must be adding some rare ingredient, but I can’t think of anything rare enough to really explain why no-one else got these results first. (People at the labs probably worked out a large fraction of this, but I still don’t understand why the people not at the labs didn’t.)
In addition to the general questions about which strand is more important, there are a few more considerations:
No-one can tell ex ante how a piece of work or research stream will pan out, so everyone will always be wrong ex post sometimes in their prioritisation decisions
My day job is at Oxford University’s AI Governance Initiative (a great place!) and I need to be producing some legible research that an appreciable number of other people are finding useful
I’m vastly more effective at work when I have an angle of attack and a drive to write up the results — recently this has been for these bite-size pieces of understanding AI scaling. The fact that there is a lot of response from others is helping with this as each piece receives some pushback that leads me to the next piece.
But I’ve often found your (Wei Dai’s) comments over the last 15-or-so years to be interesting, unusual, and insightful. So I’ll definitely take into account your expressed demand for more philosophical work and will look through those pages of philosophical questions you linked to.
Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it’s right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It’s been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I’ve gotten a few comments from philosophers saying they found the list helpful or that they’ll think about working on some of the problems, but I’m not aware of any concrete follow-ups.
If it is some kind of cultural or structural problem, it might be even higher leverage to work on solving that, instead of object level philosophical problems. I’d try to do this myself, but as an outsider to academic philosophy and also very far from any organizations who might potentially hire philosophers to work on AI alignment, it’s hard for me to even observe what the problem might be.
Re 99% of academic philosophers, they are doing their own thing and have not heard of these possibilities and wouldn’t be likely to move away from their existing areas if they had. Getting someone to change their life’s work is not easy and usually requires hours of engagement to have a chance. It is especially hard to change what people work on in a field when you are outside that field.
A different question is about the much smaller number of philosophers who engage with EA and/or AI safety (there are maybe 50 of these). Some of these are working on some of those topics you mention. e.g. Will MacAskill and Joe Carlsmith have worked on several of these. I think some have given up philosophy to work on other things such as AI alignment. I’ve done occasional bits of work related to a few of these (e.g. here on dealing with infinities arising in decision theory and ethics without discounting) and also to other key philosophical questions that aren’t on your list.
For such philosophers, I think it is a mixture of not having seen your list and not being convinced these are the best things that they each could be working on.
I was going to say something about lack of incentives, but I think it is also a lack of credible signals that the work is important, is deeply desired by others working in these fields, and would be used to inform deployments of AI. In my view, there isn’t much desire for work like this from people in the field and they probably wouldn’t use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.
Any thoughts on Legible vs. Illegible AI Safety Problems, which is in part a response to this?
Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.
Any other names you can cite?
Thanks, this makes sense to me, and my follow-up is how concerning do you think this situation is?
One perspective I have is that at this point, several years into a potential AI takeoff, with AI companies now worth trillions in aggregate, alignment teams at AI companies still have virtually no professional philosophical oversight (or outside consultants that they rely on), and are kind of winging it based on their own philosophical beliefs/knowledge. It seems rather like trying to build a particle collider or fusion reaction with no physicists on the staff, only engineers.
(Or worse, unlike engineers’ physics knowledge, I doubt that receiving a systematic education in fields like ethics and metaethics is a hard requirement for working as an alignment researcher. And even worse, unlike the situation in physics, we don’t even have settled ethics/metaethics/metaphilosophy/etc. that alignment researchers can just learn and apply.)
Maybe the AI companies are reluctant to get professional philosophers involved, because in the fields that do have “professional philosophical oversight”, e.g., bioethics, things haven’t worked out that well. (E.g. human challenge trials being banned during COVID.) But to me, this would be a signal to yell loudly that our civilization is far from ready to attempt or undergo an AI transition, rather than a license to wing it based on one’s own philosophical beliefs/knowledge.
As an outsider, the situation seems cray alarming to me, and I’m confused that nobody else is talking about it, including philosophers like you who are in the same overall space and looking at roughly the same things. I wonder if you have a perspective that makes the situation not quite as alarming as it appears to me.
I’m a philosopher who’s switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.
With regards to your Problems in AI Alignment that philosophers could potentially contribute to:
I agree that many of these questions are important and that more people should work on them.
But a fair amount of them are discussed in conventional academic philosophy, e.g.:
How to resolve standard debates in decision theory?
Infinite/multiversal/astronomical ethics
Fair distribution of benefits
What is the nature of philosophy?
What constitutes correct philosophical reasoning?
How should an AI aggregate preferences between its users?
What is the nature of normativity?
And these are all difficult, controversial questions.
For each question, you have to read and deeply think about at least 10 papers (and likely many more) to get a good understanding of the question and its current array of candidate answers.
Any attempt to resolve the question would have to grapple with a large number of considerations and points that have previously been made in relation to the question.
Probably, you need to write something at least book-length.
(And it’s very hard to get people to read book-length things.)
In trying to do this, you probably don’t find any answer that you’re really confident in.
I think most philosophers’ view on the questions they study is: ‘It’s really hard. Here’s my best guess.’
Or if they’re confident of something, it’ll be a small point within existing debates (e.g. ‘This particular variant of this view is subject to this fatal objection.’).
And even if you do find an answer you’re confident in, you’ll have a very hard time convincing other philosophers of that answer.
They’ll bring up some point that you hadn’t thought of.
Or they’ll differ from you in their bedrock intuitions, and it’ll be hard for either of you to see any way to argue the other out of their bedrock intuition.
In some cases—like population ethics and decision theory—we have proofs that every possible answer will have some unsavory implication. You have to pick your poison, and different philosophers will make different picks.
And on inductive grounds, I suspect that many other philosophical questions also have no poison-free answers.
Derek Parfit is a good example here.
He spent decades working on On What Matters, trying to settle the questions of ethics and meta-ethics.
He really tried to get other philosophers to agree with him.
But very few do. The general consensus in philosophy is that it’s not a very convincing book.
And I think a large part of the problem is a difference of bedrock intuitions. For example, Bernard Williams simply ‘didn’t have the concept of a normative reason,’ and there was nothing that Parfit could do to change that.
It also seems like there’s not much of an appetite among AI researchers for this kind of work.
If there were, we might see more discussions of On What Matters, or any of the other existing works on these questions.
When I decided to start working on AI, I seriously considered working on the kinds of questions you list. But due to the points above, I chose to do my current work instead.
This reads to me like you’re saying “these problems are hard [so Wei Dai is over-rating the importance of working on them]”, whereas the inference I would make is “these problems are hard, so we need to slow down AI development, otherwise we won’t be able to solve them in time.”
I didn’t meant to imply that Wei Dai was overrating the problems’ importance. I agree they’re very important! I was making the case that they’re also very intractable.
If I thought solving these problems pre-TAI would be a big increase to the EV of the future, I’d take their difficulty to be a(nother) reason to slow down AI development. But I think I’m more optimistic than you and Wei Dai about waiting until we have smart AIs to help us on these problems.
Do you want to talk about why you’re relatively optimistic? I’ve tried to explain my own concerns/pessimism at https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy and https://forum.effectivealtruism.org/posts/axSfJXriBWEixsHGR/ai-doing-philosophy-ai-generating-hands.
I said a little in another thread. If we get aligned AI, I think it’ll likely be a corrigible assistant that doesn’t have its own philosophical views that it wants to act on. And then we can use these assistants to help us solve philosophical problems. I’m imagining in particular that these AIs could be very good at mapping logical space, tracing all the implications of various views, etc. So you could ask a question and receive a response like: ‘Here are the different views on this question. Here’s why they’re mutually exclusive and jointly exhaustive. Here are all the most serious objections to each view. Here are all the responses to those objections. Here are all the objections to those responses,’ and so on. That would be a huge boost to philosophical progress. Progress has been slow so far because human philosophers take entire lifetimes just to fill in one small part of this enormous map, and because humans make errors so later philosophers can’t even trust that small filled-in part, and because verification in philosophy isn’t much quicker than generation.
The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don’t know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There’s also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand the tree for us, there are potential advantages like you mentioned, but also potential disadvantages, like AIs not having good intuition/judgment about what lines of arguments to pursue, or the argument tree (or AI-generated philosophical literature) becoming too large for any humans to read and think about in a relevant time frame. Many will be very tempted to just let AIs answer the questions / make the final conclusions for us, especially if AIs also accelerate technological progress, creating many urgent philosophical problems related to how to use them safely and beneficially. Or if humans try to make the conclusions, can easily get them wrong despite AI help with expanding the argument tree.
So I think undergoing the AI transition without solving metaphilosophy, or making AIs autonomously competent at philosophy (good at getting correct conclusions by themselves) is enormously risky, even if we have corrigible AIs helping us.
I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don’t necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them “more legible” to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.
Yes, I agree this is valuable, though I think it’s valuable mainly because it increases the probability that people use future AIs to solve these problems, rather than because it will make people slow down AI development or try very hard to solve them pre-TAI.
I’m not sure but I think maybe I also have a different view than you on what problems are going to be bottlenecks to AI development. e.g. I think there’s a big chance that the world would steam ahead even if we don’t solve any of the current (non-philosophical) problems in alignment (interpretability, shutdownability, reward hacking, etc.).
I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be “called to arms” by a civilization-wide AI safety effort, and would be given sufficient resources including time, so the list was trying to be more comprehensive (listing every philosophical problem that I thought relevant to AI safety) than prioritizing. Unfortunately, the reality is nearly the completely opposite of this.
Currently, one of my main puzzles is why philosophers with public AI x-risk estimates still have numbers in the 10% range, despite reality being near the most pessimistic of my range of expectations, and it looking like that the AI takeoff/transition will occur while most of these philosophical problems will remain in a wide open or totally confused state, and AI researchers seem almost completely oblivious or uncaring about this. Why are they not making the same kind of argument that I’ve been making, that philosophical difficulty is a reason that AI alignment/x-safety is harder than many think, and an additional reason to pause/stop AI?
I don’t think philosophical difficulty is that much of an increase to the difficulty of alignment, mainly because I think that AI developers should (and likely will) aim to make AIs corrigible assistants rather than agents with their own philosophical views that they try to impose on the world. And I think it’s fairly likely that we can use these assistants (if we succeed in getting them and aren’t disempowered by a misaligned AI instead) to help a lot with these hard philosophical questions.