> Besides RSPs, can you give any additional examples of approaches that you’re excited about from the perspective of building a bigger tent & appealing beyond AI risk communities? This balancing act of “find ideas that resonate with broader audiences” and “find ideas that actually reduce risk and don’t merely serve as applause lights or safety washing” seems quite important. I’d be interested in hearing if you have any concrete ideas that you think strike a good balance of this, as well as any high-level advice for how to navigate this.
I’m pretty focused on red lines, and I don’t think I necessarily have big insights on other ways to build a bigger tent, but one thing I have been pretty enthused about for a while is putting more effort into investigating potentially concerning AI incidents in the wild. Based on case studies, I believe that exposing and helping the public understand any concerning incidents could easily be the most effective way to galvanize more interest in safety standards, including regulation. I’m not sure how many concerning incidents there are to be found in the wild today, but I suspect there are some, and I expect there to be more over time as AI capabilities advance.
> Additionally, how are you feeling about voluntary commitments from labs (RSPs included) relative to alternatives like mandatory regulation by governments (you can’t do X or you can’t do X unless Y), preparedness from governments (you can keep doing X but if we see Y then we’re going to do Z), or other governance mechanisms?
The work as I describe it above is not specifically focused on companies. My focus is on hammering out (a) what AI capabilities might increase the risk of a global catastrophe; (b) how we can try to catch early warning signs of these capabilities (and what challenges this involves); and (c) what protective measures (for example, strong information security and alignment guarantees) are important for safely handling such capabilities. I hope that by doing analysis on these topics, I can create useful resources for companies, governments and other parties.
I suspect that companies are likely to move faster and more iteratively on things like this than governments at this stage, and so I often pay special attention to them. But I’ve made clear that I don’t think voluntary commitments alone are sufficient, and that I think regulation will be necessary to contain AI risks. (Quote from earlier piece: “And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.”)
one thing I have been pretty enthused about for a while is putting more effort into investigating potentially concerning AI incidents in the wild. Based on case studies, I believe that exposing and helping the public understand any concerning incidents could easily be the most effective way to galvanize more interest in safety standards, including regulation. I’m not sure how many concerning incidents there are to be found in the wild today, but I suspect there are some, and I expect there to be more over time as AI capabilities advance.
Interesting idea—I can see how exposing AI incidents could be important. This brought to my mind the paper Malla: Demystifying Real-world Large Language Model Integrated Malicious Services. (No affiliation with the paper, just one that I remember reading and we referenced in some Berkeley CLTC AI Security Initiative research earlier this year.) The researchers on the Malla paper dug into the dark web and uncovered hundreds of malicious services based on LLMs being distributed in the wild.
> Besides RSPs, can you give any additional examples of approaches that you’re excited about from the perspective of building a bigger tent & appealing beyond AI risk communities? This balancing act of “find ideas that resonate with broader audiences” and “find ideas that actually reduce risk and don’t merely serve as applause lights or safety washing” seems quite important. I’d be interested in hearing if you have any concrete ideas that you think strike a good balance of this, as well as any high-level advice for how to navigate this.
I’m pretty focused on red lines, and I don’t think I necessarily have big insights on other ways to build a bigger tent, but one thing I have been pretty enthused about for a while is putting more effort into investigating potentially concerning AI incidents in the wild. Based on case studies, I believe that exposing and helping the public understand any concerning incidents could easily be the most effective way to galvanize more interest in safety standards, including regulation. I’m not sure how many concerning incidents there are to be found in the wild today, but I suspect there are some, and I expect there to be more over time as AI capabilities advance.
> Additionally, how are you feeling about voluntary commitments from labs (RSPs included) relative to alternatives like mandatory regulation by governments (you can’t do X or you can’t do X unless Y), preparedness from governments (you can keep doing X but if we see Y then we’re going to do Z), or other governance mechanisms?
The work as I describe it above is not specifically focused on companies. My focus is on hammering out (a) what AI capabilities might increase the risk of a global catastrophe; (b) how we can try to catch early warning signs of these capabilities (and what challenges this involves); and (c) what protective measures (for example, strong information security and alignment guarantees) are important for safely handling such capabilities. I hope that by doing analysis on these topics, I can create useful resources for companies, governments and other parties.
I suspect that companies are likely to move faster and more iteratively on things like this than governments at this stage, and so I often pay special attention to them. But I’ve made clear that I don’t think voluntary commitments alone are sufficient, and that I think regulation will be necessary to contain AI risks. (Quote from earlier piece: “And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.”)
Interesting idea—I can see how exposing AI incidents could be important. This brought to my mind the paper Malla: Demystifying Real-world Large Language Model Integrated Malicious Services. (No affiliation with the paper, just one that I remember reading and we referenced in some Berkeley CLTC AI Security Initiative research earlier this year.) The researchers on the Malla paper dug into the dark web and uncovered hundreds of malicious services based on LLMs being distributed in the wild.