I got access to Bing Chat. It seems: - It only searches through archived versions of websites (it doesn’t retrieve today’s news articles, it accessed an older version of my Wikipedia user site) - During archivation, it only downloads the content one can see without any engagement with the website (tested on Reddit “see spoiler” buttons which reveal new content in the code. It could retrieve info from posts that gained less attention but weren’t hidden behind the spoiler button) I. e. it’s still in a box of sorts, unless it’s much more intelligent than it pretends.
Edit: A recent ACX post argues text-predicting oracles might be safer, as their ability to form goals is super limited, but it provides 2 models how even they could be dangerous: By simulating an agent or via a human who decides to take bad advice like “run the paperclip maximizer code”. Scott implies thinking it would spontaneously form goals is extreme, linking a post by Veedrac. The best argument there seems to be: It only has memory equivalent to 10 human seconds. I find this convincing for the current models but it also seems limiting for the intelligence of these systems, so I’m afraid for future models, the incentives are aligned with reducing this safety valve.
I got access to Bing Chat. It seems:
- It only searches through archived versions of websites (it doesn’t retrieve today’s news articles, it accessed an older version of my Wikipedia user site)
- During archivation, it only downloads the content one can see without any engagement with the website (tested on Reddit “see spoiler” buttons which reveal new content in the code. It could retrieve info from posts that gained less attention but weren’t hidden behind the spoiler button)
I. e. it’s still in a box of sorts, unless it’s much more intelligent than it pretends.
Edit: A recent ACX post argues text-predicting oracles might be safer, as their ability to form goals is super limited, but it provides 2 models how even they could be dangerous: By simulating an agent or via a human who decides to take bad advice like “run the paperclip maximizer code”. Scott implies thinking it would spontaneously form goals is extreme, linking a post by Veedrac. The best argument there seems to be: It only has memory equivalent to 10 human seconds. I find this convincing for the current models but it also seems limiting for the intelligence of these systems, so I’m afraid for future models, the incentives are aligned with reducing this safety valve.