I can’t get into specifics. But if you believe activities like evaluations of models to test for dangerous behaviour etc. is net negative, then that may give credence to your assumption. As an extra data point of whether we’d do work we thought was net negative, I was Head of Policy at ControlAI and co-authored narrowpath.co, and our forecasters have done numerous AI safety focused projects (with and outside of the Swift Centre, including AI 2027).
Personally I weakly think any working with AI labs (except perhaps anthropic) supports dangerous acceleration, but I think the opposing view to this is almost as strong.
That other stuff sounds way better than working with the labs too ;)
That seems much too strong to me: it’s very important that AI companies have accurate views on how dangerous their models are. When AISI evaluated Mythos and confirmed its high level of cybersecurity ability, this (from the outside) looks critical to Anthropic deciding not to release it publicly yet. This likely reduced near term risk, set some precedent, and also slowed the race slightly.
(Disclosure: the other side of SecureBio does AI evals; speaking for myself)
On “It’s very important that AI companies have accurate views on how dangerous their models are”. I would agree its important to the companies so they can prevent near-term harm and increase long-term acelleration.
I would argue that if mythos had slipped through a month ago, and lets say a bank and a government were hacked then we would have our biggest warning shot yet. If Claude had released mythos prematurely, I think it would have reduced AI risk long term because it probably would have freaked out governments and the public, which might then have legislated and put brakes on.
In this case, if Anthropic had prematurely released it, that would have slowed the race more than the real world scenario where they didn’t. The slowing due to not releasing is IMO almost negligable.
I would argue similar for biorisk evals. A warning shot now might trigger the kind of public/government reaction we need before risks get existential. Hiccups now while models aren’t takeover/existential risk ready might slow the race down in a meaningful way. Preventing lower-level biorisk events now could increase existential risk later.
But it’s obviously really difficult to tell if this kind of short term pain might be worth the longer term gain. But if the labs want the safety now, its for the purpose of continued scaling more than the safety itself. That should give us pause.
Also we’ve already seen Anthropic and Open AI back down on their safety docs red-lines, “knowing” doesn’t mean “slowing”. Its entirely possible all of the evals come in, the knowledge is there and everyone just plows on.
I would be like 51% sure on this (so barely at all), but at the very least its daft to automatically think that safety work now is necessarily a good thing for the world long term. There’s a lot of complexity there. I think there are strong arguments for and against near-term safety.
Whilst I strongly disagree with the claim at the object level, many other non-forecasting AI safety interventions work with labs in some way, so even if this were true, the relative penalty applied to AIS forecasting work would be fairly low.
We could “forecast” the likelihood of that haha.
I can’t get into specifics. But if you believe activities like evaluations of models to test for dangerous behaviour etc. is net negative, then that may give credence to your assumption. As an extra data point of whether we’d do work we thought was net negative, I was Head of Policy at ControlAI and co-authored narrowpath.co, and our forecasters have done numerous AI safety focused projects (with and outside of the Swift Centre, including AI 2027).
Personally I weakly think any working with AI labs (except perhaps anthropic) supports dangerous acceleration, but I think the opposing view to this is almost as strong.
That other stuff sounds way better than working with the labs too ;)
That seems much too strong to me: it’s very important that AI companies have accurate views on how dangerous their models are. When AISI evaluated Mythos and confirmed its high level of cybersecurity ability, this (from the outside) looks critical to Anthropic deciding not to release it publicly yet. This likely reduced near term risk, set some precedent, and also slowed the race slightly.
(Disclosure: the other side of SecureBio does AI evals; speaking for myself)
On “It’s very important that AI companies have accurate views on how dangerous their models are”. I would agree its important to the companies so they can prevent near-term harm and increase long-term acelleration.
I would argue that if mythos had slipped through a month ago, and lets say a bank and a government were hacked then we would have our biggest warning shot yet. If Claude had released mythos prematurely, I think it would have reduced AI risk long term because it probably would have freaked out governments and the public, which might then have legislated and put brakes on.
In this case, if Anthropic had prematurely released it, that would have slowed the race more than the real world scenario where they didn’t. The slowing due to not releasing is IMO almost negligable.
I would argue similar for biorisk evals. A warning shot now might trigger the kind of public/government reaction we need before risks get existential. Hiccups now while models aren’t takeover/existential risk ready might slow the race down in a meaningful way. Preventing lower-level biorisk events now could increase existential risk later.
But it’s obviously really difficult to tell if this kind of short term pain might be worth the longer term gain. But if the labs want the safety now, its for the purpose of continued scaling more than the safety itself. That should give us pause.
Also we’ve already seen Anthropic and Open AI back down on their safety docs red-lines, “knowing” doesn’t mean “slowing”. Its entirely possible all of the evals come in, the knowledge is there and everyone just plows on.
I would be like 51% sure on this (so barely at all), but at the very least its daft to automatically think that safety work now is necessarily a good thing for the world long term. There’s a lot of complexity there. I think there are strong arguments for and against near-term safety.
Whilst I strongly disagree with the claim at the object level, many other non-forecasting AI safety interventions work with labs in some way, so even if this were true, the relative penalty applied to AIS forecasting work would be fairly low.