Small note: the title made me think the platform is made by the organization Open AI
Aleksi Maunu
Will be joining tomorrow!
I think the stated reasoning there by OP is that it’s important to influence OpenAI’s leadership’s stance and OpenAI’s work on AI existential safety. Do you think this is unreasonable?
To be fair I do think it makes a lot of sense to invoke nepotism here. I would be highly suspicious of the grant if I didn’t happen to place a lot of trust in Holden Karnofsky and OP.
(feel free to not respond, I’m just curious)
Great job on the talk! :)
I’d be curious to know in more detail how giving the books to the audience was done
Interesting, I hadn’t thought of the anchoring effect you mention. One way to test this might be to poll the same audience about other more outlandish claims, something like the probability of x-risk from alien invasion, or CERN accidentally creating a blackhole.
“Accredited Investors can join Angel Investment Networks and other exclusive communities that provide unique opportunities for high impact.”
Can you expand on this, what kinds of opportunities are you thinking of? Funding startups that have potential to do good in an EA sense? Influencing high net worth individuals’ donations? Making lots of money to donate?
I love this!
GPT-4 doesn’t have the internal bits which make inner alignment a relevant concern.
Is this commonly agreed upon even after fine-tuning with RLHF? I assumed it’s an open empirical question. The way I understand is is that there’s a reward signal (human feedback) that’s shaping different parts of the neural network that determines GPT-4′s ouputs, and we don’t have good enough interpretability techniques to know whether some parts of the neural network are representations of “goals”, and even less so what specific goals they are.
I would’ve thought it’s an open question whether even base models have internal representations of “goals”, either always active or only active in some specific context. For example if we buy the simulacra (predictors?) frame, a goal could be active only when a certain simulacrum is active.
(would love to be corrected :D)
CGP Grey is great! I’m also a fan of exurb1a’s channel, they have many videos with EA-adjacent themes. This one sticks out to me as moving EA content: https://youtu.be/n__42UNIhvU
I think it’s great that these are being posted on the forum! I’ve often found that I’d like to discuss an episode after listening to it, but haven’t known of a place with active discussion on it (other than twitter, which feels too scattered to me).
Naively I would trade a lot of clearly-safe stuff being delayed or temporarily prohibited for even a minor decrease in chance of safe-seeming-but-actually-dangerous stuff going through, which pushes me towards favoring a more expansive scope of regulation.
(in my mind the potential loss of decades of life improvements currently pale vs potential non-existence of all lives in the longterm future)
Don’t know how to think about it when accounting for public opinion though, I expect a larger scope will gather more opposition to regulation, which could be detrimental in various ways, the most obvious being decreased likelihood of such regulation being passed/upheld/disseminated to other places.
I think if I was issuing grants, I would use misleading language in such a letter to make it less likely that the grantee organization can’t get registered for some bureaucracy reasons. It’s possible to mention that to the grantee in an email or call too to not cause any confusion. My guess would be that that’s what happened here but that’s just my 2 cents. I have no relevant expertise.
Does anyone have thoughts on
-
How does the FTX situation affect the EV of running such a survey? My first intuition is that running one while the situation’s so fresh is worse than waiting a 3-6 months, but I can’t properly articulate why.
-
What, if any, are some questions that should be added, changed, or removed given the FTX situation?
-
(After writing this I thought of one example where the goals are in conflict: permanent surveillance that stops the development of advanced AI systems. Thought I’d still post this in case others have similar thoughts. Would also be interested in hearing other examples.)
I’m assuming a reasonable interpretation of the proxy goal of safety means roughly this: “be reasonably sure that we can prevent AI systems we expect to be built from causing harm”. Is this a good interpretation? If so, when is this proxy goal in conflict with the goal of having “things go great in the long run”?
I agree that it’s epistemically good for people to not confuse proxy goals with goals, but in practice I have trouble thinking of situations where these two are in conflict. If we’ve ever succeeded in the first goal, it seems like making progress in the second goal should be much easier, and at that point it would make more sense to advocate using-AI-to-bring-a-good-future-ism.
Focusing on the proxy goal of AI safety seems also good for the reason that it makes sense across many moral views, while people are going to have different thoughts on what it means for things to “go great in the long run”. Fleshing out those disagreements is important, but I would think there’s time to do that when we’re in a period of lower existential risk.
I’d be interested in the historical record for similar industries, could you quickly list some examples that come to your mind? No need to elaborate much.
I wonder how one could explain the pleasures of learning about a subject as contentment, relief, or anticipated relief. Maybe they’d describe it as getting rid of the suffering-inducing desire to get knowledge / acceptation from peers / whatever motivates people to learn?
I’m sure it would be possible to find meditators who came to the opposite conclusion about well-being.
If someone reading this happens to know of any I’d be interested to know! I wouldn’t be that surprised if they were very rare, since my (layman) impression is that Buddhism aligns well with suffering-focused ethics, and I assume most meditators are influenced by Buddhism.
Can you give a bit more context about what you’re looking for? Is this a thought experiment type of thing?
Thanks for the post!
We’ll probably be trying this out at EA Helsinki.
hey, thanks for this post! I find it quite nice.
Thanks for the comment! I feel funny saying this without being the author, but feel like the rest of my comment is a bit cold in tone, so thought it’s appropriate to add this :)
I lean more moral anti-realist but I struggle to see how the concept of “value alignment” and “decision-making quality” are not similarly orthogonal from a moral realist view than an anti-realist view.
Moral realist frame: “The more the institution is intending to do things according to the ‘true moral view’, the more it’s value-aligned.”
“The better the institutions’s decision making process is at predictably leading to what they value, the better their ‘decision-making quality’ is.”
I don’t see why these couldn’t be orthogonal in at least some cases. For example, a terrorist organization could be outstandingly good at producing outstandingly bad outcomes.
Still, it’s true that the “value-aligned” term might not be the best, since some people seem to interpret it as a dog-whistle for “not following EA dogma enough” [link] (I don’t, although might be mistaken). “Altruism” and “Effectiveness”as the x and y axes would suffer from the problem mentioned in the post that it could alienate people coming to work on IIDM from outside the EA community. For the y-axis, ideally I’d like some terms that make it easy to differentiate between beliefs common in EA that are uncontroversial (“let’s value people’s lives the same regardless of where they live”), and beliefs that are more controversial (“x-risk is the key moral priority of our times”).
About the problematicness of ” value-neutral”: I thought the post gave enough space for the belief that institutions might be worse than neutral on average, marking statements implying the opposite as uncertain. For example crux (a) exists in this image to point out that if you disagree with it, you would come to a different conclusion about the effectiveness of (A).
(I’m testing out writing more comments on the EA forum, feel free to say if it was helpful or not! I want to learn to spend less time on these. This took about 30 minutes.)