This task, of trying to align them, is something that shouldn’t just be left to researchers in AI companies.
Why? I would find an AI expert is much more suited to align a potential AGI than any common person. I just don’t see how the common person could contribute to alignment. If anything, I can see how they would contribute to DISalignment (engineering better jailbreaks, using the models for nefarious purposes, giving the models “bad values” (like “cause as much damage as possible”), etc.). I think I value existential risk above all else, and I can’t imagine publicly releasing “almost superhuman” models can decrease it.
But you’re not claiming that the models should only be shared with AI researchers. You’re claiming they should only be shared with AI researchers specifically employed by Anthropic.
Although no, I disagree that the input from non-AI-researchers is useless here—as you need to hear both from the end users and from people affected by AI and its decisions.
I’m thinking more of the “endgame” here, so I think the input from non-researchers is no more valuable than the input of the researchers (as in, any useful information you could obtain about AI safety can be obtained just from the researchers alone). To be specific, I believe something along the lines of AI 2027 is gonna be the somewhat-near future, so I wanna restrict access to advanced models as much as possible.
Think of it like nuclear bombs. If you had a technology that powerful, you wouldn’t want to risk any bad actors getting access to it, so you limit the amount of owners as much as possible. It would be pretty ridiculous to want private companies to be able to own, or even use nuclear weapons, and I think the case is pretty similar for current and future AI.
Why? I would find an AI expert is much more suited to align a potential AGI than any common person. I just don’t see how the common person could contribute to alignment. If anything, I can see how they would contribute to DISalignment (engineering better jailbreaks, using the models for nefarious purposes, giving the models “bad values” (like “cause as much damage as possible”), etc.). I think I value existential risk above all else, and I can’t imagine publicly releasing “almost superhuman” models can decrease it.
But you’re not claiming that the models should only be shared with AI researchers. You’re claiming they should only be shared with AI researchers specifically employed by Anthropic.
Although no, I disagree that the input from non-AI-researchers is useless here—as you need to hear both from the end users and from people affected by AI and its decisions.
I’m thinking more of the “endgame” here, so I think the input from non-researchers is no more valuable than the input of the researchers (as in, any useful information you could obtain about AI safety can be obtained just from the researchers alone). To be specific, I believe something along the lines of AI 2027 is gonna be the somewhat-near future, so I wanna restrict access to advanced models as much as possible.
Think of it like nuclear bombs. If you had a technology that powerful, you wouldn’t want to risk any bad actors getting access to it, so you limit the amount of owners as much as possible. It would be pretty ridiculous to want private companies to be able to own, or even use nuclear weapons, and I think the case is pretty similar for current and future AI.