“I wish companies would report their eval results and explain how they interpret those results and what would change their mind. That should be easy. I also wish for better elicitation and some hard accountability, but that’s more costly.”
I’m far more cynical/realistic. Why would companies do this? If we take the example of failing companies, they only report what benefits them until the very last moment before they collapse. Unless there are leaks, or the failures of are so obvious that they can’t deny it.
I don’t really consider these “eval reports” in that they are straightforwardly evaluating capabilities and risk. More that they are virtue signaling that they are”thinking” about safety for the public and their shareholders.
There may still be EAs and other good people in these teams getting real and honest information our there, but this is likely to be the exception rather than the rule.
The good thing with AI companies is that to a decent extent we can cross check these findings with public models, so there’s only some degree of dishonesty/obfuscation that works.
“I wish companies would report their eval results and explain how they interpret those results and what would change their mind. That should be easy. I also wish for better elicitation and some hard accountability, but that’s more costly.”
I’m far more cynical/realistic. Why would companies do this? If we take the example of failing companies, they only report what benefits them until the very last moment before they collapse. Unless there are leaks, or the failures of are so obvious that they can’t deny it.
I don’t really consider these “eval reports” in that they are straightforwardly evaluating capabilities and risk. More that they are virtue signaling that they are”thinking” about safety for the public and their shareholders.
There may still be EAs and other good people in these teams getting real and honest information our there, but this is likely to be the exception rather than the rule.
The good thing with AI companies is that to a decent extent we can cross check these findings with public models, so there’s only some degree of dishonesty/obfuscation that works.