Riccardo comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

Riccardo 14 Apr 2023 18:22 UTC
1 point
0 ∶ 0
I’ll link to my answers here:
https://forum.effectivealtruism.org/posts/oKabMJJhriz3LCaeT/all-agi-safety-questions-welcome-especially-basic-ones-april?commentId=XGCCgRv9Ni6uJZk8d

https://forum.effectivealtruism.org/posts/oKabMJJhriz3LCaeT/all-agi-safety-questions-welcome-especially-basic-ones-april?commentId=3LHWanSsCGDrbCTSh

since it addresses some of your pointers.

To answer your question more directly, currently one of the most advanced AIs are LLMs (Large Language Models). The most popular example is GPT-4.
LLMs do not have a “will” of their own where they would “refuse” to do something beyond what is explicitly trained into it.

For example when asking GPT-4 “how to build a bomb”, it will not give you the detailed instructions but rather tell you:

”My purpose is to assist and provide helpful information to users, while adhering to ethical guidelines and responsible use of AI. I cannot and will not provide information on creating dangerous or harmful devices, including bombs. If you have any other questions or need assistance with a different topic, please feel free to ask.”

This answer is not based on any moral code but rather trained in by the company Open AI in an attempt to align the AI.

The LLM itself, in a simple way “looks at your question and predicts word by word the most likely next string of words to write”. This is a simplified way to say it and doesn’t capture how amazing this actually is, so please look into it more if this sounds interesting, but my point is that GPT-4 can create amazing results without having any sort of understanding of what it is doing.

Say in the near future an open source version of GPT-4 gets released and you take away the pre-training of the safety, you will be able to ask it to build a bomb and it will give you detailed instructions on how to do so, like it did in the early stages of GPT.

I’m using the building a bomb analogy, but you can imagine how you can apply this to any concept, specifically to your question “how to build a smarter agent”. The LLMs are not there yet, but give it a few iterations and who knows.