Most areas of capabilities research receive a 10x speedup from AI automation before most areas of safety research
The biggest factors seem to me to be feedback quality/​good metrics and AI developer incentives to race
Most areas of capabilities research receive a 10x speedup from AI automation before most areas of safety research
The biggest factors seem to me to be feedback quality/​good metrics and AI developer incentives to race
Nice! It strikes me that in figure 1, information is propagating upward, from indicator to feature to stance to overall probability, and so the arrows should also be pointing upward.
I think the view (stance?) I am most sympathetic to is that all our current theories of consciousness aren’t much good, so we shouldn’t update very far away from our prior, but that picking a prior is quite subjective, and so it is hard to make collective progress on this when different people might just have quite different priors for P(current AI consciousness).
Why does METR not receive cG money?
I have Thoughts about the rest of it, which I am not sure whether I will write up, but for now: I am sad for your Dad’s death and glad you got to prioritise spending some time with him.
I expect there is a fair bit we disagree about, but thanks for your integrity and effort and vision.
Perhaps the main downside is people may overuse the feature and it encourages people to spend time making small comments, whereas the current system nudges people towards leaving fewer more substantive comments and less nit-picky ones? Not sure if this has been an issue on LW, I don’t read it as much.
I think the value of work on climate change isn’t very impacted by this analysis, since it seems almost certianly solved post-ASI, so climate-driven catastrophic setbacks will generally occur pre-ASI and so not increase the total number of times we need to try to align ASI. Whereas nukes are less certainly solved post-ASI, given we may still be in a multipolar war-like world.
In Appendix A.1, it’s not clear to me that an absolute reduction is the best way of thinking about this. Perhaps it is more natural to think in relative reductions? I suppose some interventions could probably best be modelled as absolute reductions (e.g. asteroid or supervolcano itnerventions perhaps) and others as relative reductions (doubling the amount of resources spent on alignment research?).
Yes, I think the ’100 years’ criterion isn’t quite what we want. E.g. if there is a catastrophic setback more than 100 years after we build an aligned ASI, thenw e don’t need to rerun the alignment problem. (In practice, perhaps 100 years should be ample time to build good global governance and reduce catastrophic setback risk to near 0, but conceptually we want to clarify this.)
And I agree with Owen that shorter setbacks also seem important. In fact, in a simple binary model we could just define a catastrophic setback to be one that takes you from a society that has built aligned ASI to one where all aligned ASIs are destroyed. ie the key thing is not how many years back you go, but whether you regres back beneath the critical ‘crunch time’ period.
I like the idea, though I think a shared gdoc is far better for any in-line comments. Maybe if you only want people to give high-level comments this is better though—I imagine heaps of people may want to comment on gdocs you share publicly.
When I read the first italicised line of the post, I assumed that one of the unusual aspects was that the post was AI-written. So then I was unusually on the lookout for that while reading it. I didn’t notice clear slop. The few times that seemed not quite in your voice/​a bit more AI-coded were (I am probably forgetting some):
The talk of ‘uncontacted tribes’ - are there any? Seems like more something I would expect AIs to mention than you.
‘containerisation tools’ - this is more computer techno-speak than I would expect from you (I don’t really know what these tools are, maybe you do though).
‘Capacitors dry out, solder joints crack, chips suffer long-term degradation.’ - I quite like this actually but it is a bit more flowery than your normal writing I think.
So overall, I would say the AIs acquitted themselves quite well!
Nice re LEEP’s honesty (and being well-funded)!
My understanding is that at the end of the CE program, founders are given the opportunity to pitch a set of known regular CE donors for funding, and that most incubated charities get enough money for their first ~year of operations from that. See https://​​www.seednetworkfunders.com/​​ for more, seems like it is minimum $10k/​​year to join this group of donors.
My understanding is Longview does a combination of actively reaching out (or getting exisitng donors to reach out) to possible new donors, and talking to people who express interest to them directly. But I don’t know much about their process or plans.
Longview does advising for large donors like this. Some other orgs I know of are also planning for an influx of money from Anthropic employees, or thinking about how best to advise such donors on comparing cuase areas and charities and so forth. This is also relevant: https://​​forum.effectivealtruism.org/​​posts/​​qdJju3ntwNNtrtuXj/​​my-working-group-on-the-best-donation-opportunities But I agree more work on this seems good!
Nice! I’m confused as to how the budget is so low for a now-seven-person org—are most people working for very below-market salaries, or quite part-time, or how does this happen? At some orgs, this would be one person’s salary!
Nice work! I’m confused as to how 36k funds 1 senior researcher FTE year; this seems very low, what am I missing?
And also https://​​www.astralcodexten.com/​​p/​​most-technologies-arent-races#footnote-anchor-2-108490927 where Scott argues we shouldn’t care much between a corporate-controlled singularity, a USG-controlled singularity, and a CCP-controlled singularity. I still disagree (because these different groups are differently likely to pursue optimal moral reflection and innovation), but it is well worth a read.
Nice, seems like important work, keen to hear what comes of this!
Relevant paper from earlier this year that I missed: https://​​www.far.ai/​​news/​​defense-in-depth
Right, but because we have limited resources, we need to choose whether to invest more in just a few stronger layers, or less each in more different layers. Of course in an ideal world we have heaps of really strong layers, but that may be cost-prohibitive.
Yep, that all makes sense, and I think this work can still tell us something, just it doesn’t update me too much given the lack of compelling theories or much consensus in the scientific/​philosophical community. This is harsher than what I actually think, but directionally, it has the feel of ‘cargo cult science’ where it has a fancy Bayesian model and lots of numbers and so forth, but if it all built on top of philosophical stances I don’t trust then it doesn’t move me much. But that said it is still interesting e.g. how wide the range for chickens is.