Hey! This is another update from the distillers at AI Safety Info.
Here are a couple of the answers that we wrote up over the last month (July 2023). As always let us know in the comments if there are any questions that you guys have that you would like to see answered.
The list below redirects to individual links, and the collective URL above renders all of the answers in the list on one page at once.
These are only the new articles. There has also been significant work in overhauling already live on site articles to improve their quality based on the feedback we have been receiving from readers.
Stampy’s AI Safety Info—New Distillations #4 [July 2023]
Link post
Hey! This is another update from the distillers at AI Safety Info.
Here are a couple of the answers that we wrote up over the last month (July 2023). As always let us know in the comments if there are any questions that you guys have that you would like to see answered.
The list below redirects to individual links, and the collective URL above renders all of the answers in the list on one page at once.
These are only the new articles. There has also been significant work in overhauling already live on site articles to improve their quality based on the feedback we have been receiving from readers.
Isn’t the real concern misuse?
What is Vingean uncertainty?
What is a “polytope” in a neural network?
What are the power-seeking theorems?
How does “chain-of-thought” prompting work?
What is “Constitutional AI”?
How can LLMs be understood as “simulators”?
What evidence do experts usually base their timeline predictions on?
Wouldn’t AIs need to have a power-seeking drive to pose a serious risk?
What is a “treacherous turn”?
What is reinforcement learning from human feedback (RLHF)?
What is an agent?
What is the Alignment Research Center’s research strategy?
Wouldn’t a superintelligence be smart enough not to make silly mistakes in its comprehension of our instructions?
What are the differences between subagents and mesa-optimizers?
What is the difference between verifiability, interpretability, transparency, and explainability?
Crossposted to Lesswrong: https://www.lesswrong.com/posts/EW7rQsGkeKnY6AgXa/stampy-s-ai-safety-info-new-distillations-4-july-2023