AI strategy & governance. ailabwatch.org.
Zach Stein-Perlman
DeepMind’s “Frontier Safety Framework” is weak and unambitious
DeepMind: Frontier Safety Framework
I agree such commitments are worth noticing and I hope OpenAI and other labs make such commitments in the future. But this commitment is not huge: it’s just “20% of the compute we’ve secured to date” (in July 2023), to be used “over the next four years.” It’s unclear how much compute this is, and with compute use increasing exponentially it may be quite little in 2027. Possibly you have private information but based on public information the minimum consistent with the commitment is quite little.
It would be great if OpenAI or others committed 20% of their compute to safety! Even 5% would be nice.
In November, leading AI labs committed to sharing their models before deployment to be tested by the UK AI Safety Institute.
I suspect Politico hallucinated this / there was a game-of-telephone phenomenon. I haven’t seen a good source on this commitment. (But I also haven’t heard people at labs say “there was no such commitment.”)
The original goal involved getting attention. Weeks ago, I realized I was not on track to get attention. I launched without a sharp object-level goal but largely to get feedback to figure out whether to continue working on this project and what goals it should have.
I share this impression. Unfortunately it’s hard to capture the quality of labs’ security with objective criteria based on public information. (I have disclaimers about this in 4-6 different places, including the homepage.) I’m extremely interested in suggestions for criteria that would capture the ways Google’s security is good.
Not necessarily. But:
There are opportunity costs and other tradeoffs involved in making the project better along public-attention dimensions.
The current version is bad at getting public attention; improving it and making it get 1000x public attention would still leave it with little; likely it’s better to wait for a different project that’s better positioned and more focused on getting public attention. And as I said, I expect such a project to appear soon.
Yep. But in addition to being simpler, the version of this project optimized for getting attention has other differences:
Criteria are better justified, more widely agreeable, and less focused on x-risk
It’s done—or at least endorsed and promoted—by a credible org
The scoring is done by legible experts and ideally according to a specific process
Even if I could do this, it would be effortful and costly and imperfect and there would be tradeoffs. I expect someone else will soon fill this niche pretty well.
Yep, that’s related to my “Give some third parties access to models to do model evals for dangerous capabilities” criterion. See here and here.
As I discuss here, it seems DeepMind shared super limited access with UKAISI (only access to a system with safety training + safety filters), so don’t give them too much credit.
I suspect Politico is wrong and the labs never committed to give early access to UKAISI. (I know you didn’t assert that they committed that.)
Introducing AI Lab Watch
Utilitarians aware of the cosmic endowment, at least, can take comfort in the fact that the prospect of quadrillions of animals suffering isn’t even a feather in the scales. They shut up and multiply.
(Many others should also hope humanity doesn’t go extinct soon, for various moral and empirical reasons. But the above point is often missed among people I know.)
Hmm, I think having the mindset behind effective altruistic action basically requires you to feel the force of donating. It’s often correct to not donate because of some combination of expecting {better information/deconfusion, better donation opportunities, excellent non-donation spending opportunities, high returns, etc.} in the future. But if you haven’t really considered large donations or don’t get that donating can be great, I fail to imagine how you could be taking effective altruistic action. (For extremely rich people.) (Related indicator of non-EA-ness: not strongly considering causes outside the one you’re most passionate about.)
(I don’t have context on Bryan Johnson.)
Staged release
See https://ea-internships.pory.app/board, you can filter for volunteer.
It would be helpful to mention if you have background or interest in particular cause areas.
(I endorse this.)
I’m annoyed at vague “value” questions. If you ask a specific question the puzzle dissolves. What should you do to make the world go better? Maximize world-EV, or equivalently maximize your counterfactual value (not in the maximally-naive way — take into account how “your actions” affect “others’ actions”). How should we distribute a fixed amount of credit or a prize between contributors? Something more Shapley-flavored, although this isn’t really the question that Shapley answers (and that question is almost never relevant, in my possibly controversial opinion).
Happy to talk about well-specified questions. Annoyed at questions like “should I use counterfactuals here” that don’t answer the obvious reply, “use them FOR WHAT?”
I don’t feel 100% bought-in to the Shapley Value approach, and think there’s a value in paying attention to the counterfactuals. My unprincipled compromise approach would be to take some weighted geometric mean and call it a day.
FOR WHAT?
Let’s assume in all of these scenarios that you are only one of the players in the situation, and you can only control your own actions.
If this is your specification (implicitly / further specification: you’re an altruist trying to maximize total value, deciding how to trade off between increasing X and doing good in other ways) then there is a correct answer — maximize counterfactual value (this is equivalent to maximizing total value, or argmaxing total value over your possible actions), not your personal Shapley value or anything else. (Just like in all other scenarios. Multiplicative-ness is irrelevant. Maximizing counterfactual value is always the answer to questions about what action to take.)
Finally
My current impression is that there is no mechanism and funders will do whatever they feel like and some investors will feel misled...
I now agree funders won’t really lose out, at least.
Hmm. I am really trying to fill in holes, not be adversarial, but I mostly just don’t think this works.
the funder probably recognizes some value in [] the projects the investors funded that weren’t selected for retrofunding
No. If the project produces zero value, then no value for funder. If the project produces positive value, then it’s retrofunded. (At least in the simple theoretical case. Maybe in practice small-value projects don’t get funded. Then profit-seeking investors raise their bar: they don’t just fund everything that’s positive-EV, only stuff that’s still positive-EV when you treat small positive outcomes as zero. Not sure how that works out.)
the funder probably recognizes some value in . . . aligned investors likely devoting their “profits” on other good projects
Yes.
If those gains aren’t enough for the retrofunder, it could promise 100% payment up to investment price, but only partial payment of impact over the investment price—thus splitting the surplus between itself and the investor in whatever fraction seems advisible.
Surely this isn’t optimal, there’s deadweight loss. And it’s still exploitable and this suggests that something is broken. E.g. Alice can do something like: write a bad proposal for her project to ensure it isn’t funded in advance, self-fund at an investment of $10, and thereby extract $10 from the funders.
I suspect the informal agreement was nothing more than the UK AI safety summit “safety testing” session, which is devoid of specific commitments.