I just remembered another sub-category that seems important to me: AI-enabled very accurate lie detection. This could be useful for many things, but most of all for helping make credible commitments in high-stakes US-China ASI negotiations.
OscarDđ¸
Thanks Caleb, very useful. @ConnorA Iâm interested in your thoughts re how to balance comms on catastrophic/âexistential risks and things like Deepfakes. (I donât know about the particular past efforts Caleb mentioned, and I think I am more open to comms of Deepfakes being useful to develop a broader coalition, even though deepfakes are a tiny fraction of what I care about wrt AI.)
Have you applied to LTFF? Seems like the sort of thing they would/âshould fund. @Linch @calebp if you have actually already evaluated this project I would be interested in your thoughts as would others I imagine! (Of course, if you decided not to fund it, Iâm not saying the rest of us should defer to you, but it would be interesting to know and take into account.)
Unclearâas they note early on, many people have even shorter timelines than Ege, so not representeative in that sense. But probably many of the debates are at least relevant axes people disagree on.
o1-pro!
Here is a long AI summary of the podcast.
If these people werenât really helping the companies it seems surprising salaries are so high?
I think I directionally agree!
One example of timelines feeling very decision-relevant is for people who are looking to specialise in partisan influence, you might want to specialise far more in Republicans the larger your credence in TAI/âASI by Jan 2029. Whereas for longer timelines on priors Democrats have a ~50% chance of controlling the presidency from 2029, so specialising in Dem political comms could make more sense.
Of course criticism is only a partially overlapping set with advice, but this post reminded me a bit of this take on giving and receiving criticism.
I overall agree we should prefer USG to be better AI-integrated. I think this isnât a particularly controversial or surprising conclusion though, so I think the main question is how high a priority this is, and I am somewhat skeptical it is on the ITN pareto frontier. E.g. I would assume plenty of people care about government efficiency and state capacity generally, and a lot of these interventions are generally about making USG more capable rather than too targeted towards longtermist priorities.
So this felt like neither the sort of piece targeted to mainstream US policy folks, nor that convincing for why this should be an EA/âlongtermist focus area. Still, I hadnât thought much about this before, and so doing this level of medium-depth investigation feels potentially valuable, but Iâm unconvinced that e.g. OP should spin up a grantmaker focused on this (not that you were necessarily recommending this).
Also, a few reasons govts may have a better time adopting AI come to mind:
Access to large amounts of internal private data
Large institutions can better afford one-time upfront costs to train or finetune specialised models, compared to small businesses
But I agree the opposing reasons you give are probably stronger.
we should do what we normally do when juggling different priorities: evaluate the merits and costs of specific interventions, looking for âwin-winâ opportunities
If only this were how USG juggled its priorities!
Yes, this seems right, hard to know which effect will dominate. Iâm guessing you could assemble pretty useful training data of past R&D breakthroughs which might help, but that will only get you so far.
Clearly only IBBIS should be allowed to advertise on the job board from now on, impeccable marketing skills @Tessa A đ¸ :)
This seems to be out of context?
Yeah I think I agree with all this; I suppose since âweâ have the AI policy/âstrategy training data anyway that seems relatively low effort and high value to do, but yes if we could somehow get access to the private notes of a bunch of international negotiators that also seems very valuable! Perhaps actually asking top forecasters to record their working and meetings to use as training data later would be valuable, and I assume many people already do this by default (tagging @NunoSempere). Although of course having better forecasting AIs seems more dual-use than some of the other AI tools.
Yes, I suppose I am trying to divide tasks/âprojects up into two buckets based on whether they require high context and value-alignment and strategic thinking and EA-ness. And I think my claim was/âis that UI design is comparatively easy to outsource to someone without much of the relevant context and values. And therefore the comparative advantage of the higher-context people is to do things that are harder to outsource to lower-context people. But I know ~nothing about UI design, maybe being higher context is actually super useful.
Nice post! I agree moral errors arenât only a worry for moral realists. But they do seem especially concerning for realists, as the moral truth may be very hard to discover, even for superintelligences. For antirealists, the first 100 years of a long reflection may get you most of the way to where your views will converge towards after a billion years of reflecting on your values. But the first 100 years of a long reflection are less guaranteed to get you close to the realist moral truth. So a 100-years-reflection is e.g. 90% likely to avoid massive moral errors for antirealists, but maybe only 40% likely to do so for realists.
--
Often when there are long lists like this I find it useful for my conceptual understanding to try to create some scructure to fit each item into, here is my attempt.
A moral error is making a moral decision that is quite suboptimal. This can happen if:
The agent has correct moral views, but makes a failure of judgement/ârationality/âempirics/âdecision theory and so chooses badly by their own lights.
The agent is adequately rational, but has incorrect views about ethics, namely the mapping from {possible universe trajectories} to {impartial value}. This could take the form of:
A mistake in picking out who is a moral patients, {universe trajectory} --> {moral patients}. (animals, digital beings)
A mistake in assigning lifetime wellbeing scores to each moral patient {moral patients} --> {list of lifetime wellbeing}. (theories of wellbeing, happiness vs suffering)
A mistake in aggregating correct wellbeing scores over the correct list of moral patients into the overall impartial value of the universe {list of lifetime wellbeings + possibly other relevant facts} --> {impartial value}. (population ethics, diversity, interestingness)
--
Some minor points:
I think the fact that people wouldnât take bets involving near-certain death and a 1-in-a-billion chance of a long amazing life is more evidence about people being risk averse than that lifetime wellbeing is bounded above.
As currently written, choosing Variety over Homogeneity would only be a small moral error, not a massive one, as epsilon is small.
Great set of posts (including the âhow farâ and âhow suddenâ related ones). I only skimmed the parts I had read drafts of, but still have a few comments, mostly minor:
1. Accelerating progress framing
We define âaccelerating AI progressâ as âeach increment of capability advancement (e.g. GPT-3 â GPT-4) happens more quickly than the lastâ.
I am a bit skeptical of this definition, both because it is underspecified, and Iâm not sure it is pointing at the most important thing.
Underspecified: how many GPT jumps need to be in the âeach quicker than the lastâ regime? This seems more than just a semantic quibble, as clearly the one-time speedup leads to at least one GPT jump being faster, and the theoretical limits lead to this eventually stopping, but Iâm not sure where within this you want to call it âacceleratingâ.
Framing: Basically, we are trying to condense a whole graph into a few key numbers, so this might be quite loss-y and we need to focus on the variables that are most strategically important, which I think are:
Timeline: date that transition period starts
Suddenness: time in transition period
Plateau height: in effective compute, defining the plateau as when rate of progress drops back below 2025 levels.
Plateau date: how long it takes to get there.
Iâm not sure there is an important further question of whether the line is curving up or down between the transition period and the plateau (or more precisely, when it transitions from curving up (as in the transition period) to curving down (as in the plateau)). I suppose âacceleratingâ could include plateauing quite quickly, and decelerating could include still going very fast and reaching a very high plateau quickly, which to most people wouldnât intuitively feel like âdecelerationâ.
2. Max rate of change
Theoretical limits for the speed of progress are 100X as fast as recent progress.
It would be good to flag in the main text that the justification for this is in Appendix 2 (initially I thought it was a bare asertion). Also, it is interesting that in @kokotajlodâs scenario the âwildly superintelligentâ AI maxes out at 1 million-fold AI R&D speedup; I commented to them on a draft that this seemed implausibly high to me. I have no particular take on whether 100x is too low or too high as the theoretical max, but it would be interesting to work out why there is this Forethought vs AI Futures difference.
3. Error in GPT OOMS calculations
Algorithmic improvements compound multiplicatively rather than additively, so the formula in column G I think should be 3^years rather than 3*years?
This also clears up the current mismatch between columns G and H. Most straightforward would be for column H to be log10(G), same as column F. But since log(a^b) = b*log(a), once you make the correction to column G you get out column H = log(3^years) = years * log(3) = 0.48*years. Which is close to what you currently have of 0.4 * years, I assume there was just a rounding error somewhere.
This wonât end up changing the main results though.
4. Physical limits
Regarding the effective physical limits of each feedback loop, perhaps it is worth noting that your estimates are very well grounded and high-confidence for the chip production feedback loop as we know more or less exactly the energy output of the sun. But the other two are super speculative. Which is fine, they are just quite different types of estimates, so we should remember to rely far less on them.
5. End of the transition period
Currently, this is set at when AIs are almost as productive (9/â10) as humans, but it would make more sense to me to end it when AIs are markedly superior to humans, e.g. 10x.
Maybe I am misunderstanding elasticities though, I only have a lay non-economistâs grasp of them.
Overall it might be more intuitive to define the transition period in terms of how useful one additional human researcher vs AI researcher is, from human being 10x better to the AI being 10x better.
Defining what âone AI researcherâ is could be tricky, maybe we could use the pace of human thought in tokens per second as a way to standardise.
(Finally, Fn2 is missing a link.)
Thanks for this, I hadnât thought much about the topic and agree it seems more neglected than it should be. But I am probably overall less bullish than you (as operationalised by e.g. how many people in the existential risk field should be making this a significant focus: I am perhaps closer to 5% than your 30% at present).
I liked your flowchart on âInputs in the AI application pipeline,â so using that framing:
Learning algorithms: I agree this is not very tractable for us[1] to work on.
Training data: This seems like a key thing for us to contribute, particularly at the post-training stage. By supposition, a large fraction of the most relevant work on AGI alignment, control, governance, and strategy has been done by âusâ. I could well imagine that it would be very useful to get project notes, meetings, early drafts etc as well as the final report to train a specialised AI system to become an automated alignment/âgovernance etc researcher.
But my guess is just compiling this training data doesnât take that much time. All it takes is when the time comes you convince a lot of the relevant people and orgs to share old google docs of notes/âdrafts/âplans etc paired with the final product.
There will be a lot of infosec considerations here, so maybe each org will end up training their own AI based on their own internal data. I imagine this is what will happen for a lot of for-profit companies.
Making sure we donât delete old draft reports and meeting notes and things seems good here, but given storing google docs is so cheap and culling files is time-expensive, I think by default almost everyone just keeps most of their (at least textual) digital corpus anyway. Maybe there is some small intervention to make this work better though?
Compute: It certainly seems great for more compute to be spent on automated safety work versus automated capabilities work. But this is mainly a matter of how much money each party has to pay for compute. So lobbying for governments to spend lots on safety compute, or regulations to get companies to spend more on safety compute seems good, but this is a bit separate/âupstream from what you have in mind I think, it is more just âget key people to care more about safetyâ.
Post-training enhancements: we will be very useful for providing RLHF to tell a budding automated AI safety researcher how good each of its outputs is. Research taste is key here. This feels somewhat continuous with just âmanaging a fleet of AI research assistantsâ.
UI and complementary technologies: I donât think we have a comparative advantage here, and can just outsource this to human or AI contractors to build nice apps for us, or use generic apps on the market and just feed in our custom training data.
In terms of which applications to focus on, my guess is epistemic tools and coordination-enabling tools will mostly be built by default (though of course as you note additional effort can still speed them up some). E.g. politicians and business leaders and academics would all presumably love to have better predictions for which policies will be popular, what facts are true, which papers will replicate etc. And negotiation tools might be quite valuable for e.g. negotiating corporate mergers and deals.
So my take is that probably a majority of the game here is in âautomated AI safety/âgovernance/âstrategyâ because there will be less corporate incentive here, and it is also our comparative advantage to work on.
Overall, I agree differential AI tool development could be very important, but think the focus is mainly on providing high-quality training data and RLHF for automated AI safety research, which is somewhat narrower than what you describe.
Iâm not sure how much we actually disagree though, would be interested in your thoughts!
- ^
Throughout, I use âusâ to refer broadly to EA/âlongtermist/âexistential security type folks.
So if we take as given that I am at 53% and Alice is at 45% that gives me some reason to do longtermist outreach, and gives Alice some reason to try to stop me, perhaps by making moral trades with me that get more of what we both value. In this case, cluelessness doesnât bite as Alice and I are still taking action towards our longtermist ends.
However, I think what you are claiming, or at least the version of your position that makes most sense to me, is that both Alice and I would be making a failure of reasoning if we assign these specific credence, and that we should both be âsuspending judgementâ. And if I grant that, then yes it seems cluelessness bites as neither Alice or I know at all what to do now.
So it seems to come down to whether we should be precise Bayesians.
Re judgment calls, yes I think that makes sense, though Iâm not sure it is such a useful category. I would think there is just some spectrum of arguments/âpieces of evidence from âvery well empirically grounded and justifiedâ through âwe have some moderate reason to think soâ to âwe have roughly no ideaâ and I think towards the far right of this spectrum is what we are labeling judgement calls. But surely there isnât a clear cut-off point.
Seems somewhat epistemically toxic to give in to a populist backlash against AI art if I donât buy the arguments for it being bad myself.