I would be interested to see results from a similar experiment where the groups were given access to the “Bad Llama” model, or given the opportunity to create their own version by re-tuning Llama 2 or another open source model. I don’t have a strong prior as to whether such a model would help the groups to develop more dangerous plans.
I would be interested to see results from a similar experiment where the groups were given access to the “Bad Llama” model, or given the opportunity to create their own version by re-tuning Llama 2 or another open source model. I don’t have a strong prior as to whether such a model would help the groups to develop more dangerous plans.